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FOREWORD 



On behalf of the Organizing Committee, it is my great pleasure 
to welcome you to the International Conference on Fifth Generation 
Computer Systems 1992. 

The Fifth Generation Computer Systems (FGCS) project was 
started in 1982 by the initiative of the late Professor Tohru Moto- 
Oka with the purpose of making a revolutionary new type of com- 
puters oriented to knowledge processing in the 1990s. After complet- 
ing the initial and intermediate stages of research and development, 
we are now at the final point of our ten-year project and are rapidly 
approaching the completion of prototype Fifth Generation Com- 
puter Systems. 

The research goals of the FGCS project were challenging, but 
we expect to meet most of them. We have developed a new paradigm 
of knowledge processing including the parallel logic language, KL1, 
and the parallel inference machine, PIM. 

When we look back upon these ten years, we can find many 
research areas in knowledge processing related to this project, such 
as logic programming, parallel processing, natural language process- 
ing, and machine learning. Furthermore, there emerged many new 
applications of knowledge processing, such as legal reasoning and 
genetic information processing. 

I believe that this new world of information processing will 
grow more and more in the future. When very large knowledge bases 
including common sense knowledge come out in full scale and are 
widely used, the knowledge processing paradigm will show its real 
power and will give us great rewards. From now on, we can enjoy 
fifth generation computer technology in many fields. 

Following the same objective of creating such a new paradigm, 
there has been intense international collaboration, such as joint 
workshops with France, Italy, Sweden, the U.K., and the U.S.A., and 
joint research with U.S.A. and Swedish institutes on parallel process- 
ing applications. 

Against this background, ICOT hosts the International Confer- 
ence on Fifth Generation Computer Systems 1992 (FGCS’92). This 
is the last in a series of FGCS conferences; previous conferences were 
held in 1981, 1984 and 1988. The purpose of the conference is to 
present the final results of the FGCS project, as well as to promote 
the exchange of new ideas in the fields of knowledge processing, 
logic programming, and parallel processing. 

FGCS’92 will take place over five days. The first two days will 
be devoted to the presentation of the latest results of the FGCS 
project, and will include invited lectures by leading researchers. The 




remaining three days will be devoted to technical sessions for invited 
and submitted papers, the presentation of the results of detailed 
research done at ICOT, and panel discussions. 

Professor D. Bj^rner from the United Nations University, 
Professor J.A. Robinson from Syracuse University, and Professor 
C.A.R. Hoare from Oxford University kindly accepted our offer to 
give invited lectures. 

Professor R. Kowalski from Imperial College is the chairperson of 
the plenary panel session on “A springboard for information proces- 
sing in the 21st century.” Professor Hajime Karatsu from Tokai 
University accepted our invitation to give a banquet speech. 

During the conference, there will be demonstrations of the 
research results from the ten-year FGCS project. The Parallel Infer- 
ence Machines and many kinds of parallel application programs will 
be highlighted to show the feasibility of the machines. 

I hope that this conference will be a nice place to present all of 
the research results in this field up to this time, confirm the mile- 
stones, and propose a future direction for the research, development 
and applications of the fifth generation computers through vigorous 
discussions among attendees from all over the world. I hope all of 
the attendees will return to their own countries with great expecta- 
tions in minds and feel that a new era of computer science has 
opened in terms of fifth generation computer systems. 

Moreover, I wish that the friendship and frank cooperation 
among researchers from around the world, brewed in the process of 
fifth generation computer systems research, will grow and widen so 
that this small but strong relationship can help promote interna- 
tional collaboration for the brilliant future of mankind. 

Hidehiko Tanaka 
Conference Chairperson 




FOREWORD 



Esteemed guests, let me begin by welcoming you to the International Conference on 
Fifth Generation Computer Systems, 1992. I am Hideaki Kumano. I am the Director 
General of the Machinery and Information Industries Bureau of MITI. 

We have been promoting the Fifth Generation Computer Systems Project, with the 
mission of international contributions to technological development by promoting the 
research and development of information technology in the basic research phase and 
distributing the achievements of that research worldwide. This international conference 
is thus of great importance in making our achievements available to all. It is, therefore, 
a great honor for me to be given the opportunity to make the keynote speech today. 



1 Achievements of the Project 

Since I took up my current post, I have had several opportunities to visit the project site. 
This made a great impression on me since it proved to me that Japanese technology can 
produce spectacular results in an area of highly advanced technology, covering the fields 
of parallel inference machine hardware and its basic software such as operating systems 
and programming languages; fields in which no one had any previous experience. 

Furthermore, I caught a glimpse of the future use of fifth generation computer tech- 
nology when I saw the results of its application to genetics and law. I was especially 
interested in the demonstration of the parallel legal inference system, since I have been 
engaged in the enactment and operation of laws at MITI. I now believe that the machines 
using the concepts of fifth generation computers will find practical applications in the 
enactment and operation of laws in the near future. 

The research and development phase of our project will be completed by the end 
of this fiscal year. We will evaluate all the results. The committee for development of 
basic computer technology, comprised of distinguished members selected from a broad 
spectrum of fields, will make a formal evaluation of the project. This evaluation will take 
into account the opinions of those attending the conference, as well as the results of a 
questionnaire completed by overseas experts in each field. Even before this evaluation, 
however, I am convinced that the project has produced results that will have a great 
impact on future computer technology. 

2 Features of the Fifth Generation Computer Systems Project 

I will explain how we set our goals and developed a scheme that would achieve these 
high-level technological advances. 

The commencement of the project coincided with the time when Japan was coming 
to be recognized as a major economic and technological power in the world community. 
Given these circumstances, the objectives of the project included not only the develop- 
ment of original and creative technology, but also the making of valuable international 
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contributions. In this regard, we selected a theme of “knowledge information process- 
ing”, which would have a major impact on a wide area from technology through to the 
economy. The project took as its research goal the development of a parallel inference 
system, representing the paradigm of computer technology as applied to the theme. 

The goal was particularly challenging at that time. I recalled the words of a partic- 
ipant at the first conference held in 1981. He commented that it was doubtful whether 
Japanese researchers could succeed in such a project since we, at that time, had very 
little experience in these fields. 

However, despite the difficulties of the task ahead of us, we promoted the project 
from the viewpoint of contributing to the international community through research. In 
this regard, our endeavors in this area were targeted as pre-competitive technologies, 
namely basic research. This meant that we would have to start from scratch, assembling 
and training a group of researchers. 

To achieve our goal of creating a paradigm of new computer technology, taking an 
integrated approach starting from basic research, we settled on a research scheme after 
exhaustive preliminary deliberations. 

As part of its efforts to promote the dissemination of basic research results as inter- 
national public assets, the government of Japan, reflecting its firm commitment to this 
area, decided to finance all research costs. 

The Institute for New Generation Computer Technology (ICOT), the sponsor of this 
conference, was established to act as a central research laboratory where brainpower 
could be concentrated. Such an organization was considered essential to the development 
of an integrated technology that could be applied to both hardware and software. The 
Institute’s research laboratory, that actually conducted the project’s research and devel- 
opment, was founded precisely ten years ago, today, on June 1 of 1982. A number of 
highly qualified personnel, all of whom were excited by the ideal that the project pursued, 
were recruited from the government and industry. Furthermore, various ad hoc groups 
were formed to promote discussions among researchers in various fields, making ICOT 
the key center for research communication in this field. 

The duration of the project was divided into three phases. Reviews were conducted 
at the end of each phase, from the viewpoint of human resources and technological ad- 
vances, which made it possible to entrust various areas of the research. I believe that 
this approach increased efficiency, and also allowed flexibility by eliminating redundant 
areas of research. 

We have also been heavily involved in international exchanges, with the aim of pro- 
moting international contributions. Currently, we are involved in five different interna- 
tional research collaboration projects. These include work in the theorem proving field 
with the Australian National University (ANU), and research into constraint logic pro- 
gramming with the Swedish Institute of Computer Science (SICS). The results of these 
two collaborations, on display in the demonstration hall, are excellent examples of what 
research collaboration can achieve. We have also promoted international exchange by 
holding international conferences and by hosting researchers from abroad at ICOT. And, 
we have gone to great lengths to make public our project’s achievements, including in- 




termediate results. 



3 Succession of the Project’s Ideal 

This project is regarded as being the prototype for all subsequent projects to be sponsored 
by MITI. 

It is largely due to the herculean efforts of the researchers, under the leadership of Dr. 
Fuchi and other excellent research leaders, that have led to the revolutionary advances 
being demonstrated at this conference. 

In the light of these achievements, and with an eye to the future, I can now state 
that there is no question of the need to make international contributions the basis of the 
policies governing future technological development at MITI. This ideal will be passed 
on to all subsequent research and development projects. 

A case in point is the Real World Computing (RWC) project scheduled to start this 
year. This project rests on a foundation of international cooperation. Indeed, the basic 
plan, approved by a committee a few days ago, specifically reflects the international 
exchange of opinions. The RWC project is a particularly challenging project that aims 
to investigate the fundamental principles of human-like flexible information processing 
and to implement it as a new information processing technology, taking full advantage 
of advancing hardware technologies. We will not fail to make every effort to achieve the 
project’s objectives for use as common assets for all mankind. 

4 International Response 

As I mentioned earlier, I believe that the Fifth Generation Computer System Project 
has made valuable international contributions from its earliest stages. The project has 
stimulated international interest and responses from its outset. The great number of 
foreign participants present today illustrates this point. 

Around the world, a number of projects received their initial impetus from our 
project: these include the Strategic Computing Initiative in the U.S.A., the EC’s Es- 
prit project, and the Alvey Project in the United Kingdom. 

These projects were initially launched to compete with the Fifth Generation Com- 
puter Systems Project. Now, however, I strongly believe that since our ideal of inter- 
national contributions has come to be understood around the globe, together with the 
realization that technology can not and should not be divided by borders, each project 
is providing the stimulus for the others, and all are making major contributions to the 
advancement of information processing technologies. 

5 Free Access to the Project’s Software 

One of the great virtues of science, given an open environment, is the collaboration 
between researchers using a common base of technology. 




Considering this, it would be impractical for one person or even one nation to attempt 
to cover the whole range of technological research and development. Therefore, the 
necessity of international cooperation is self-evident from the standpoint of advancing 
the human race as a whole. 

In this vein, MITI has decided to promote technology globalism in the fields of science 
and technology, based on a concept of “international cooperative effort for creative ac- 
tivity and international exchange to maximize the total benefit of science and technology 
to mankind.” We call this concept “techno-globalism”. 

It is also important to establish an environment based on “techno-globalism”, that 
supports international collaboration in basic and original research as a resource to solve 
problems common to all mankind as well as the dissemination of the resulting achieve- 
ments. This could be done through international cooperation. 

To achieve this “techno-globalism” all countries should, as far as possible, allow free 
and easy access to their domestic technologies. This kind of openness requires the volun- 
tary establishment of environments where anyone can access technological achievements 
freely, rather than merely asking other countries for information. It is this kind of inter- 
national cooperation, with the efforts of both sides complementing each other, that can 
best accelerate the advancement of technology. 

We at MITI have examined our policies from the viewpoint of promoting international 
technological advancement by using the technologies developed as part of this project, 
the superbness of which has encouraged us to set a new policy. 

Our project’s resources focused mainly on a variety of software, including parallel 
operating systems and parallel logic programming languages. To date, the results of such 
a national project, sponsored by the government, were available only for a fee and could 
be used only under various conditions once they became the property of the government. 
Therefore, generally speaking, although the results have been available to the public, in 
principle, they have not been available to be used freely and widely. 

As I mentioned earlier, in the push toward reaching the goal of promoting inter- 
national cooperation for technological advancement, Japan should take the initiative in 
creating an environment where all technologies developed in this project can be accessed 
easily. Now, I can formally announce that, concerning software copyrights in the research 
and development phase which are not the property of the government, the Institute for 
New Generation Computer Technology (ICOT), the owner of these copyrights of software 
products is now preparing to enable their free and and open use without charge. 

The adoption of this policy not only allows anyone free access to the software tech- 
nologies developed as part of the project, but also make it possible for interested parties 
to inherit the results of our research, to further advance the technology. I sincerely hope 
that our adopting this policy will maximize the utilization of researchers’ abilities, and 
promote the advancement of the technologies of knowledge information processing and 
parallel processing, toward which all efforts have been concentrated during the project. 

This means that our adopting this policy will not merely result in a one-way flow 
of technologies from Japan, but enhance the benefit to all mankind of the technological 
advancements brought on by a two-way flow of technology and the mutual benefits thus 
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obtained. 

I should say that, from the outset of the Fifth Generation Computer Systems Project, 
we decided make international contributions an important objective of the project. We 
fashioned the project as the model for managing the MITI-sponsored research and devel- 
opment projects that were to follow. Now, as we near the completion of the project, we 
have decided to adopt a policy of free access to the software to inspire further international 
contributions to technological development. 

I ask all of you to understand the message in this decision. I very much hope that the 
world’s researchers will make effective use of the technologies resulting from the project 
and will devote themselves to further developing the technologies. 

Finally, Fd like to close by expressing my heartfelt desire for this international con- 
ference to succeed in providing a productive forum for information exchange between 
participants and to act as a springboard for further advancements. 

Thank you very much for bearing with me. 

Hideaki Kumano 
Director General 
Machinery and Information Industries Bureau 
Ministry of International Trade and Industry (MITI) 




PREFACE 



Ten years have passed since the FGCS project was launched 
with the support of the Japanese government. As soon as the FGCS 
project was announced it had a profound effect not only on com- 
puter scientists but also on the computer industry. Many countries 
recognized the importance of the FGCS project and some of them 
began their own similar national projects. 

The FGCS project was initially planned as a ten-year project 
and this final fourth FGCS conference, therefore, has a historical 
meaning. For this reason the conference includes an ICOT session. 
The first volume contains a plenary session and the ICOT session. 
The plenary session is composed of many reports on the FGCS 
project with three invited lectures and a panel discussion. 

In the ICOT session, the logic-based approach and parallel 
processing will be emphasized through concrete discussions. In 
addition to these, many demonstration programs have been prepared 
by ICOT at the conference site, the participants are invited to visit 
and discuss these exhibitions. Through the ICOT session and the 
exhibitions, the participants will understand clearly the aim and 
results of the FGCS project and receive a solid image of FGCS. 

The second volume is devoted to the technical session which 
consists of three invited papers and technical papers submitted to this 
conference. Due to the time and space limitation of the conference, 
only 82 papers out of 256 submissions were selected by the program 
committee after careful and long discussion of many of the high 
quality papers submitted. 

It is our hope that the conference program will prove to be both 
worthwhile and enjoyable. As a program chairperson, it is my great 
pleasure to acknowledge the support of a number of people. First of 
all, I would like to give my sincere thanks to the program committee 
members who put a lot of effort into making the program attractive. 
I owe much to the three program vice-chairpersons, Professor 
Makoto Amamiya, Dr. Shigeki Goto and Professor Fumio Mizogu- 
chi. Many ICOT members, including Dr. Kazunori Ueda, Ken 
Satoh, Keiji Hirata, and Hideki Yasukawa have worked as key 
persons to organize the program. Dr. Koichi Furukawa, in particu- 
lar, has played an indispensable role in overcoming many problems. 
I would also like to thank the many referees from many countries 
who replied quickly to the referees sheets. 

Finally, I would like to thank the secretariat at ICOT, they 
made fantastic efforts to carry out the administrative tasks efficiently. 

Hozumi Tanaka 
Program Chairperson 
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Launching the New Era 

Kazuhiro Fuchi 
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Thank you for coming to FGCS’92. As you know, we 
have been conducting a ten-year research project on fifth 
generation computer systems. Today is the tenth an- 
niversary of the founding of our research center, making 
it exactly ten years since our project actually started. 

The first objective of this international conference is to 
show what we have accomplished in our research during 
these ten years. 

Another objective of this conference is to offer an op- 
portunity for researchers to present the results of ad- 
vanced research related to Fifth Generation Computer 
Systems and to exchange ideas. A variety of innovative 
studies, in addition to our own, are in progress in many 
parts of the world, addressing the future of computers 
and information processing technologies. 

I constantly use the phrase “Parallel Inference” as the 
keywords to simply and precisely describe the technolog- 
ical goal of this project. Our hypothesis is that parallel 
inference technology will provide the core for those new 
technologies in the future — technologies that will be able 
to go beyond the framework of conventional computer 
technologies. 

During these ten years I have tried to explain this idea 
whenever I have had the chance. One obvious reason why 
I have repeated the same thing so many times is that 
I wish its importance to be recognized by the public. 
However, I have another, less obvious, reason. 

When this project started, an exaggerated image of 
the project was engendered, which seems to persist even 
now. For example, some people believed that we were 
trying, in this project, to solve in a mere ten years some 
of the most difficult problems in the field of artificial in- 
telligence (AI), or to create a machine translation system 
equipped with the same capabilities as humans. 

In those days, we had to face criticism, based upon 
that false image, that it was a reckless project trying 
to tackle impossible goals. Now we see criticism, from 
inside and outside the country, that the project has failed 
because it has been unable to realize those grand goals. 

The reason why such an image was born appears to 
have something to do with FGCS’81 — a conference we 
held one year before the project began. At that confer- 



ence we discussed many different dreams and concepts. 
The substance of those discussions was reported as sen- 
sational news all over the world. 

A vision with such ambitious goals, however, can never 
be materialized as a real project in its original form. 
Even if a project is started in accordance with the origi- 
nal form, it cannot be managed and operated within the 
framework of an effective research scheme. Actually, our 
plans had become much more modest by the time the 
project was launched. 

For example, the development of application systems, 
such as a machine translation system, was removed from 
the list of goals. It is impossible to complete a highly 
intelligent system in ten years. A preliminary stage is 
required to enhance basic studies and to reform com- 
puter technology itself. We decided that we should focus 
our efforts on these foundational tasks. Another rea- 
son is that, at that time in Japan, some private compa- 
nies had already begun to develop pragmatic, low-level 
machine-translation systems independently and in com- 
petition with each other. 

Most of the research topics related to pattern recog- 
nition were also eliminated, because a national project 
called “Pattern Information Processing” had already 
been conducted by the Ministry of International Trade 
and Industry for ten years. We also found that the stage 
of the research did not match our own. 

We thus deliberately eliminated most research top- 
ics covered by Pattern Information Processing from the 
scope of our FGCS project. However, those topics them- 
selves are very important and thus remain major topics 
for research. They may become a main theme of another 
national project of Japan in the future. 

Does all this mean that FGCS’81 was deceptive? I 
do not think so. First, in those days, a pessimistic out- 
look predominated concerning the future development of 
technological research. For example, there was a general 
trend that research into artificial intelligence would be of 
no practical use. In that sort of situation, there was con- 
siderable value in maintaining a positive attitude toward 
the future of technological research — whether this meant 
ten years or fifty. I believe that this was the very reason 
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why we received remarkable reactions, both positive and 
negative, from the public. 

The second reason is that the key concept of Parallel 
Inference was presented in a clear-cut form at FGCS’81. 
Let me show you a diagram (Figure 1). This diagram is 
the one I used for my speech at FGCS’81, and is now a 
sort of “ancient document.” Its draft was completed in 
1980, but I had come up with the basic idea four years 
earlier. After discussing the concept with my colleagues 
for four years, I finally completed this diagram. 

Here, you can clearly see our concept that our goal 
should be a “Parallel Inference Machine.” We wanted 
to create an inference machine, starting with study on 
a variety of parallel architectures. For this purpose, re- 
search into a new language was necessary. We wanted to 
develop a 5G-kernel language — what we now call KL1. 
The diagram includes these hopes of ours. 

The upper part of the diagram shows the research in- 
frastructure. A personal inference machine or worksta- 
tion for research purposes should be created, as well as a 
chip for the machine. We expected that the chip would 
be useful for our goal. The computer network should be 
consolidated to support the infrastructure. The software 
aspects are shown in the bottom part of the diagram. 
Starting with the study on software engineering and AI, 
we wanted to build a framework for high-level symbol 
processing, which should be used to achieve our goal. 
This is the concept I presented at the FGCS’81 confer- 
ence. 

I would appreciate it if you would compare this di- 
agram with our plan and the results of the final stage 
of this project, when Deputy Director Kurozumi shows 
you them later. I would like you to compare the original 
structure conceived 12 years ago and the present results 
of the project so that you can appreciate what has been 
accomplished and criticize what is lacking or what was 
immature in the original idea. 

Some people tend to make more of the conclusions 
drawn by a committee than the concepts and beliefs of 
an individual. It may sound a little bit beside point, but 
I have heard that there is a proverb in the West that 
goes, “The horse designed by a committee will turn out 
to be a camel.” 

The preparatory committee for this project had a se- 
ries of enthusiastic discussions for three years before the 
project’s launching. I thought that they were doing an 
exceptional job as a committee. Although the commit- 
tee’s work was great, however, I must say that the plan 
became a camel. It seems that their enthusiasm cre- 
ated some extra humps as well. Let me say in passing 
that some people seem to adhere to those humps. I am 
surprised that there is still such a so-called bureaucratic 
view even among academic people and journalists. 

This is not the first time I have expressed this opinion 
of mine about the goal of the project. I have, at least 
in Japanese, been declaring it in public for the past ten 



years. I think I could have been discharged at any time 
had my opinion been inappropriate. 

As the person in charge of this project, I have pushed 
forward with the lines of Parallel Inference based upon 
my own beliefs. Although I have been criticized as still 
being too ambitious, I have always been prepared to take 
responsibility for that. 

Since the project is a national project, it goes without 
saying that it should not be controlled by one person. I 
have had many discussions with a variety of people for 
more than ten years. Fortunately, the idea of the project 
has not remained just a personal belief but has become 
a common belief shared by the many researchers and 
research leaders involved in the project. 

Assuming that this project has proved to be successful, 
as I believe it has, this fact is probably the biggest reason 
for its success. For a research project to be successful, it 
needs to be favored by good external conditions. But the 
most important thing is that the research group involved 
has a common belief and a common will to reach its 
goals. I have been very fortunate to be able to realize 
and experience this over the past ten years. 

So much for introductory remarks. I wish to outline, in 
terms of Parallel Inference, the results of our work con- 
ducted over these ten years. I believe that the remarkable 
feature of this project is that it focused upon one lan- 
guage and, based upon that language, experimented with 
the development of hardware and software on a large 
scale. 

From the beginning, we envisaged that we would take 
logic programming and give it a role as a link that con- 
nects highly parallel machine architecture and the prob- 
lems concerning applications and software. Our mission 
was to find a programming language for Parallel Infer- 
ence. 

A research group led by Deputy Director Furukawa 
was responsible for this work. As a result of their ef- 
forts, Ueda came up with a language model, GHC, at 
the beginning of the intermediate stage of the project. 
The two main precursors of it were Parlog and Concur- 
rent Prolog. He enhanced and simplified them to make 
this model. Based upon GHC, Chikayama designed a 
programming language called KL1. 

KL1, a language derived from the logic programming 
concept, provided a basis for the latter half of our 
project. Thus, all of our research plans in the final stage 
were integrated under a single language, KL1. 

For example, we developed a hardware system, the 
Multi-PSI, at the end of the intermediate stage, and 
demonstrated it at FGCS’88. After the conference we 
made copies and have used them as the infrastructure 
for software research. 

In the final stage, we made a few PIM prototypes, a 
Parallel Inference Machine that has been one of our final 
research goals on the hardware side. These prototypes 
are being demonstrated at this conference. 





Each prototype has a different architecture in its in- 
terconnection network and so forth, and the architecture 
itself is a subject of research. Viewed from the outside, 
however, all of them are KL1 machines. 

Division Chief Uchida and Laboratory Chief Taki will 
show you details on PIM later. What I want to em- 
phasize here is that all of these prototypes are designed, 
down to the level of internal chips, with the assumption 
that KL1, a language that could be categorized as a very 
high-level language, is a “machine language.” 

On the software side as well, our research topics were 
integrated under the KL1 language. All the application 
software, as well as the basic software such as operating 
systems, were to be written in KL1. 



We demonstrated an operating system called PIMOS 
at FGCS’88, which was the first operating system soft- 
ware written in KL1. It was immature at that time, but 
has been improved since then. The full-fledged version 
of PIMOS now securely backs the demonstrations being 
shown at this conference. 

Details will later be given by Laboratory Chief 
Chikayama, but I wish to emphasize that not only have 
we succeeded in writing software as complicated and 
huge aS an operating system entirely in KL1, but we 
have also proved through our own experience that KL1 
is much more appropriate than conventional languages 
for writing system software such as operating systems. 

One of the major challenges in the final stage was to 
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demonstrate that KL1 is effective not only for basic soft- 
ware, such as operating systems and language implemen- 
tations, but also for a variety of applications. As Labo- 
ratory Chief Nitta will report later, we have been able to 
demonstrate the effectiveness of KL1 for various appli- 
cations including LSI-CAD, genetic analysis, and legal 
reasoning. These application systems address issues in 
the real world and have a virtually practical scale. But, 
again, what I wish to emphasize here is that the objec- 
tive of those developments has been to demonstrate the 
effectiveness of Parallel Inference. 

In fact, it was in the initial stage of our project that we 
first tried the approach of developing a project around 
one particular language. The technology was at the level 
of sequential processing, and we adopted ESP, an ex- 
panded version of Prolog, as a basis. 

Assuming that ESP could play a role of KLO, our ker- 
nel language for sequential processing, a Personal Se- 
quential Inference machine, called PSI, was designed as 
hardware. We decided to use the PSI machine as a work- 
station for our research. Some 500 PSIs, including mod- 
ified versions, have so far been produced and used in the 
project. 

SIMPOS, the operating system designed for PSI, is 
written solely in ESP. In those days, this was one of 
the largest programs written in a logic programming lan- 
guage. 

Up to the intermediate stage of the project, we used 
PSI and SIMPOS as the infrastructure to conduct re- 
search on expert systems and natural language process- 
ing. 

This kind of approach is indeed the dream of re- 
searchers, but some of you may be skeptical about our 
approach. Our project, though conducted on a large 
scale, is still considered basic research. Accordingly, it is 
supposed to be conducted in a free, unrestrained atmo- 
sphere so as to bring about innovative results. Some of 
you may wonder whether the policy of centering around 
one particular language restrains the freedom and diver- 
sity of research. 

But this policy is also based upon my, or our, philos- 
ophy. I believe that research is a process of “assuming 
and verifying hypotheses.” If this is true, the hypotheses 
must be as pure and clear as possible. If not, you cannot 
be sure of what you are trying to verify. 

A practical system itself could include compromise or, 
to put it differently, flexibility to accommodate various 
needs. However, in a research project, the hypotheses 
must be clear and verifiable. Compromises and the like 
could be considered after basic research results have been 
obtained. This has been my policy from the very begin- 
ning, and that is the reason why I took a rather contro- 
versial or provocative approach. 

We had a strong belief that our hypothesis of focusing 
on Parallel Inference and KL1 had sufficient scope for a 
world of rich and free research. Even if the hypothesis 



acted as a constraint, we believed that it would act as a 
creative constraint. 

I would be a liar if I was to say that there was no 
resistance among our researchers when we decided upon 
the above policy. KL1 and parallel processing were a 
completely new world to everyone. It required a lot of 
courage to plunge headlong into this new world. But 
once the psychological barrier was overcome, the re- 
searchers set out to create new parallel programming 
techniques one after another. 

People may not feel like using new programming lan- 
guages such as KL1. Using established languages and 
systems only, or a kind of conservatism, seems to be the 
major trend today. In order to make a breakthrough into 
the future, however, we need a challenging and adven- 
turing spirit. I think we have carried out our experiment 
with such a spirit throughout the ten-year project. 

Among the many other results we obtained in the fi- 
nal stage was a fast theorem-proving system, or a prover. 
Details will be given in Laboratory Chief Hasegawa’s re- 
port, but I think that this research will lead to the res- 
urrection of theorem-proving research. 

Conventionally, research into theorem proving by com- 
puters has been criticized by many mathematicians who 
insisted that only toy examples could be dealt with. 
However, very recently, we were able to solve a problem 
labelled by mathematicians as an ‘open problem’ using 
our prover, as a result of collaborative research with Aus- 
tralian National University. 

The applications of our prover is not limited to math- 
ematical theorem proving; it is also being used as the 
inference engine of our legal reasoning system. Thus, 
our prover is being used in the mathematics world on 
one hand, and the legal world on the other. 

The research on programming languages has not ended 
with KL1. For example, a constraint logic programming 
language called GDCC has been developed as a higher- 
level language than KL1. We also have a language called 
Quixote. 

From the beginning of this project, I have advocated 
the idea of integrating three types of languages — logic, 
functional, and object-oriented — and of integrating the 
worlds of programming and of databases. This idea has 
been materialized in the Quixote language; it can be 
called a deductive object-oriented database language. 

Another language, CIL, was developed by Mukai in the 
study of natural language processing. CIL is a semantics 
representation language designed to be able to deal with 
situation theory. Quixote incorporates CIL in a natural 
form and therefore has the characteristics of a semantics 
representation language. As a whole, it shows one possi- 
ble future form of knowledge representation languages. 

More details on Quixote, along with the development 
of a distributed parallel database management system, 
Kappa-P, will be given by Laboratory Chief Yokota. 

Thus far I have outlined, albeit briefly, the final results 
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of our ten-year project. Recalling what I envisaged ten 
years ago and what I have dreamed and hoped would 
materialize for 15 years, I believe that we have achieved 
as much as or more than what I expected, and I am quite 
satisfied. 

Naturally, a national project is not performed for mere 
self-satisfaction. The original goal of this project was to 
create the core of next-generation computer technolo- 
gies. Various elemental technologies are needed for fu- 
ture computers and information processing. Although it 
is impossible for this project alone to provide all of those 
technologies, we are proud to be able to say that we have 
created the core part, or at least provided an instance of 
it. 

The results of this project, however, cannot be com- 
mercialized as soon as the project is finished, which is 
exactly why it was conducted as a national project. I 
estimate that it takes us another five years, which could 
be called a period for the “maturation of the technolo- 
gies”, for our results to actually take root in society. I 
had this prospect in mind when this project started ten 
years ago, and have kept declaring it in public right up 
until today. Now the project is nearing its end, but my 
idea is still the same. 

There is often a gap of ten or twenty years between the 
basic research stage of a technology and the day it ap- 
pears in the business world. Good examples are UNIX, 
C, and RISC, which has become popular in the current 
trend toward downsizing. They appear to be up-to-date 
in the business world, but research on them has been 
conducted for many years. The frank opinions of the re- 
searchers involved will be that industry has finally caught 
up with their research. 

There is thus a substantial time lag between basic re- 
search and commercialization. Our project, from its very 
outset, set an eye on technologies for the far distant fu- 
ture. Today, the movement toward parallel computers 
is gaining momentum worldwide as a technology leading 
into the future. However, skepticism was dominant ten 
years ago. The situation was not very different even five 
years ago. When we tried to shift our focus on parallel 
processing after the initial stage of the project, there was 
a strong opinion that a parallel computer was not possi- 
ble and that we should give it up and be happy with the 
successful results obtained in the initial stage. 

In spite of the skepticism about parallel computers 
that still remains, the trend seems to be changing dras- 
tically. Thanks to constant progress in semiconductor 
technology, it is now becoming easier to connect five hun- 
dred, a thousand, or even more processor chips, as far as 
hardware technology is concerned. 

Currently, the parallel computers that most people are 
interested in are supercomputers for scientific computa- 
tion. The ideas there tend to still be vague regarding the 
software aspects. Nevertheless, a new age is dawning. 

The software problem might not be too serious as long 



as scientific computation deals only with simple, scaled- 
up matrix calculations, but it will certainly become se- 
rious in the future. Now suppose this problem has been 
solved and we can nicely deal with all the aspects of 
large-scale problems with complicated overall structures. 
Then, we would have something like a general-purpose 
capability that is not limited to scientific computation. 
We might then be able to replace the mainframe com- 
puters we are using now. 

The scenario mentioned above is one possibility lead- 
ing to a new type of mainframe computer in the future. 
One could start by connecting a number of processor 
chips and face enormous difficulties with parallel soft- 
ware. 

However, he or she could alternatively start by con- 
sidering what technologies will be required in the future, 
and I suspect that the answer should be the Parallel In- 
ference technology which we have been pursuing. 

I am not going to press the above view upon you. How- 
ever, I anticipate that if anybody starts research without 
knowing our ideas, or under a philosophy that he or she 
believes is quite different from ours, after many twists 
and turns that person will reach more or less the same 
concept as ours — possibly with small differences such as 
different terminology. In other words, my opinion is that 
there are not so many different essential technologies. 

It may be valuable for researchers to struggle through 
a process of research independently from what has al- 
ready been done, finally to find that they have followed 
the same course as somebody else. But a more efficient 
approach would be to build upon what has been done in 
this FGCS project and devote energy to moving forward 
from that point. I believe the results of this project will 
provide important insights for researchers who want to 
pursue general-purpose parallel computers. 

This project will be finished at the end of this year. 
As for “maturation of the Parallel Inference technol- 
ogy”, I think we will need a new form of research activ- 
ities. There is a concept called “distributed cooperative 
computing” in the field of computation models. I ex- 
pect that, in a similar spirit, the seeds generated in this 
project will spread both inside and outside the country 
and sprout in many different parts of the world. 

For this to be realized, the results of this project must 
be freely accessible and available worldwide. In the soft- 
ware area, for example, this means that it is essential 
to disclose all our accomplishments including the source 
codes and to make them “international common public 
assets.” 

MITI Minister Watanabe and the Director General of 
the Bureau announced the policy that the results of our 
project could be utilized throughout the world. Enor- 
mous effort must have been made to formulate such a 
policy. I find it very impressive. 

We have tried to encourage international collabora- 
tion for ten years in this project. As a result, we have 




enjoyed opportunities to exchange ideas with many re- 
searchers involved in advanced studies in various parts of 
the world. They have given us much support and coop- 
eration, without which this project could not have been 
completed. 

In that regard, and also considering that this is a 
Japanese national project that aims at making a contri- 
bution, though it may only be small, toward the future of 
mankind, we believe that we are responsible for leaving 
our research accomplishments as a legacy to future gen- 
erations and to the international community in a most 
suitable form. This is now realized, and I believe it is an 
important springboard for the future. 

Although this project is about to end, the end is just 
another starting point. The advancement of computers 
and information processing technologies is closely related 
to the future of human society. Social thought, ideolo- 
gies, and social systems that fail to recognize its signifi- 
cance will perish as we have seen in recent world history. 
We must advance into a new age now. To launch a new 
age, I fervently hope that the circle of those who share 
our passion for a bright future will continue to expand. 
Thank you. 
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Abstract 

This paper introduces how the FGCS Project 
started, its overall activities and the results of the 
FGCS project. The FGCS Project was launched in 
1982 after a three year preliminary study stage. 
The basic framework of the fifth generation 
computer is parallel processing and inference 
processing based on logic programming. Fifth 
generation computers were viewed as suitable for 
the knowledge information processing needs of the 
near future. ICOT was established to promote the 
FGCS Project. This paper shows not only, ICOT’s 
efforts in promoting the FGCS project, but 
relationship between ICOT and related 
organizations as well. I, also, conjecture on the 
parallel inference machines of the near future. 

1 Preliminary Study Stage for 
the FGCS Project 

The circumstances prevailing during the 
preliminary stage of the FGCS Project, from 1979 to 
1981, can be summarized as follows. 

•Japanese computer technologies had reached the 
level of the most up-to-date overseas computer 
technologies. 

•A change of the role of the Japanese national 
project for computer technologies was being 
discussed whereby there would be a move away 
from improvement of industrial competitiveness by 
catching up with the latest European computer 
technologies and toward world-wide scientific 
contribution through the risky development of 
leading computer technologies. 

In this situation, the Japanese Ministry of 
International Trade and Industry (MITI) started 
study on a new project - the Fifth Generation 
Computer Project. This term expressed MITI’s will 
to develop leading technologies that would progress 
beyond the fourth generation computers due to 



appear in the near future and which would 
anticipate upcoming trends. 

The Fifth Generation Computer Research 
Committee and its subcommittee (Figure 1-1) were 
established in 1979. It took until the end of 1981 to 
decide on target technologies and a framework for 
the project. 




Figure 1-1 Organization of the Fifth Generation 
Computer Committee 

Well over one hundred meetings were held with a 
similar number of committee members 
participating. The following important near-future 
computer technologies were discussed. 

• Inference computer technologies for knowledge 
processing 

• Computer technologies to process large-scale 
data bases and knowledge bases 

• High performance workstation technologies 

• Distributed functional computer technologies 

• Super-computer technologies for scientific 
calculation 

These computer technologies were investigated and 
discussed from the standpoints of international 
contribution by developing original Japanese 
technologies, the important technologies in future, 
social needs and conformance with Japanese 
governmental policy for the national project. 

Through these studies and discussions, the 
committee decided on the objectives of the project by 






10 



the end of 1980, and continued future studies of 
technical matters, social impact, and project 
schemes. 

The committee’s proposals for the FGCS Project 
are summarized as follows. 

(D The concept of the Fifth Generation Computer: 
To have parallel (non-Von Neumann) 
processing and inference processing using 
knowledge bases as basic mechanisms. In order 
to have these mechanisms, the hardware and 
software interface is to be a logic program 
language (Figure 1-2) . 

(2) The objectives of the FGCS project: To develop 
these innovative computers, capable of 
knowledge information processing and to 
overcome the technical restrictions of 
conventional computers. 
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Figure 1-2 Concept of the Fifth Generation 
Computer 



the topic with foreign researchers. 

2 Overview of R&D Activities 
and Results of the FGCS 
Project 

2.1 Stages and Budgeting in the FGCS 
Project 

The FGCS project was designed to investigate a 
large number of unknown technologies that were 
yet to be developed. Since this involved a number of 
risky goals, the project was scheduled over a 
relatively long period of ten years. This ten-year 
period was divided into three stages. 

- In the initial stage (fiscal 1982-1984), the 
purpose of R&D was to develop the basic 
computer technologies needed to achieve the 
goal. . 

- In the intermediate stage (fiscal 1985-1988), the 
purpose of R&D was to develop small to medium 
subsystems. 

- In the final stage (fiscal 1989-1992), the purpose 
of R&D was to develop a total prototype system. 
The final stage was initially planned to be three 
years. After reexamination halfway through the 
final stage, this stage was extended to four years 
to allow evaluation and improvement of the total 
system in fiscal year 1992. Consequently, the 
total length of this project has been extended to 
11 years. 



(3) The goals of the FGCS project: To research and 
develop a set of hardware and software 
technologies for FGCS, and to develop an FGCS 
prototype system consisting of a thousand 
element processors with inference execution 
speeds of between 100M LIPS and 1G LIPS 
(Logical Inferences Per Second). 

(2) R&D period for the project: Estimated to be 10 
years, divided into three stages. 

• 3-year initial stage for R&D of basic 
technologies 

• 4-year intermediate stage for R&D of sub- 
systems 

• 3-year final stage for R&D of total prototype 
system 

MITI decided to launch the Fifth Generation 
Computer System (FGCS) project as a national 
project for new information processing, and made 
efforts to acquire a budget for the project. 

At the same time, the international conference on 
FGCS ’81 was prepared and held in October 1981 to 
announce these results and to hold discussions on 
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Figure 2-1 Budgets for the FGCS project 



Each year the budget for the following years R&D 
activities was decided. MITI made great efforts in 
negotiating each year’s budget with the Ministry of 
Finance. The budgets for each year, which are all 
covered by MITI, are shown in Figure 2-1. The total 
budget for the 3-year initial stage was about 8 
billion yen. For the 4-year intermediate stage, it 
was about 22 billion yen. The total budget for 1989 
to 1991 was around 21 billion yen. The budget for 
1992 is estimated to be 3.6 billion yen. 





Consequently, the total budget for the 11-year 
period of the project will be about 54 billion yen. 



2.2 R&D subjects of each stage 

At the beginning, it was considered that a detailed 
R&D plan could not be decided in detail for a period 
as long as ten years. The R&D goals and the means 
to reach these goals were not decided in detail. 
During the project, goals were sought and methods 
decided by referring back to the initial plan at the 
beginning of each stage. 

The R&D subjects for each stage, shown in Figure 
2-2, were decided by considering the framework and 
conditions mentioned below. 

We defined 3 groups of 9 R&D subjects at the 
beginning of the initial stage by analyzing and 
rearranging the 5 groups of 10 R&D subjects 
proposed by the Fifth Generation Computer 
Committee. 

At the end of the initial stage, the basic research 
themes of machine translation and speech, figure 
and image processing were excluded from this 
project. These were excluded because computer 
vender efforts on these technologies were recognized 
as having become very active. 

In the middle of the intermediate stage, the task 
of developing a large scale electronic dictionary was 
transferred to EDR (Electronic Dictionary Research 
Center), and development of CESP (Common ESP 
system on UNIX) was started by AIR (AI language 
Research Center). 

The basic R&D framework for promoting this 
project is to have common utilization of developed 
software by unifying the software development 
environment (especially by unifying programming 
languages). By utilizing software development 
systems and tools, the results of R&D can be 
evaluated and improved. Of course, considering the 
nature of this project, there is another reason 
making it difficult or impossible to use commercial 
products as a software development environment. 

In each stage, the languages and the software 
development environment are unified as follows . 

• Initial stage: Prolog on DEC machine 

• Intermediate stage: ESP on PSI and SIMPOS 

• Final stage: KL1 on Multi-PSl (or PIM) and 

PIMOS (PSI machines are also used as pseudo 
multi-PSI systems.) (Figure 2-6) 
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©Hardware System 



• Inference subsystem 
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Programming S/w System 
•Knowledge construction 
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Interface 
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• Basic Software System 
•Inference Control 
Module (PIMOS) 

•KB Management Modul 
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•Prototype Hardware 
System 



Figure 2-2 Transition of R&D subjects in each 
stage 



© Functional mechanism modules and 
simulators for PIM (Parallel Inference 
Machine) of the hardware system 

® Functional mechanism modules and 
simulators for KBM (Knowledge Base 
Machine) of the hardware system 

© SIM (Sequential Inference Machine) 
hardware of pilot model for software 
development 

© Intermediate Stage 

® Inference subsystem of the hardware system . 

(b) Knowledge base subsystem of the hardware 
system 

© Pilot model for parallel software development 
of the development support system. 

© Final Stage 

© Prototype hardware system 



<T Initial Stage X Intermediat e Stage Xf" Final Stage 
. jjj? § ? I _ '_8_3 I _ '_8_4| 19 85 1 '861 '87 1 '88119891 '90 I '91 I '92 I 




1 Multi-PSI 
yH6PEs] 



Parallel 

Inference 

System 



(ghc)~($li^ 



Multi- 


PIM 


i 

I 


i PSI 


Experimental 


-f; 


? V2(64PEs) 


Models 


i 

1 



/ PIM Mechanism 
(Simulator) 

•Dataflow Mechanism 
•Reduction Mechanisn/ 



PIM 

Hardware 

Systems 

FGCS Prototype 
Hardware 
System 



IKnowledgel 
! Base ! 

■_ System j / 



/"KBM Mechanism 7 
(Simulator) / 

•Experimental RDB /■ 1 - 
Model(Delta) / 



KBM 

Simulator and 
Experimental 
Model 



2.3 Overview of R&D Results of 
Hardware System 



Figure 2-3 Transition of R&D results of Hardware 
System 



Hardware system R&D was carried out by the 
subjects listed listed below in each stage. 

(D Initial stage 



The major R&D results on SIM were the PSI 
(Personal Sequential Inference Machine) and CHI 
(high performance back-end inference unit). In the 
initial stage, PSI- 1 (® ©) was developed as KLO 
(Kernel Language Version 0) machine. PSI- I had 
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around 35 KLIPS (Logical Inference Per Second) 
execution speed. Around 100 PSI- 1 machines were 
used as main WSs (workstations) for the sequential 
logic programming language, ESP, in the first half 
of the intermediate stage. CHI- 1 (CD ©) showed 
around 200 KLIPS execution speed by using WAM 
instruction set and high-speed devices. In the 
intermediate stage, PSI was redesigned as multi- 
PSI FEP (Front End Processor) and PSI- II , and has 
performance of around 330-400 KLIPS. CHI was 
also redesigned as CHI- II (© ©), with more than 
400 KLIPS performance. PSI- II machines were the 
main WSs for ESP after the middle of the 
intermediate stage, and were able to be used for 
KL1 by the last year of the intermediate stage. PSI- 
III was developed as a commercial product by a 
computer company by using PIM/m CPU 
technologies, with the permission of MITI, and by 
using UNIX. 

R&D on PIM continued throughout the project, as 
follows. 

• In the initial stage, experimental PIM hardware 
simulators and software simulators with 8 to 16 
processors were trial-fabricated based on data 
flow and reduction mechanisms (©©). 

• In the intermediate stage, we developed multi- 
PSI VI, which was to construct 6 PSI-Is, as the 
first version of the KL1 machine. The 
performance of this machine was only several 
KLIPS because of the KL1 emulator (( 2 ) ©). It 
did, however, provide evaluation and experience 
by developing a very small parallel OS in KL1. 
This meant that we could develop multi-PSI V2 
with 64 PSI- II CPUs connected by a mesh 
network (CD @). The performance of each CPU 
for KL1 was around 150 KLIPS, and the average 
performance of the full multi-PSI V2 was 5 
MLIPS. This speed was enough to significantly 
improved to encourage efforts to develop various 
parallel KL1 software programs including an 
practical OS. 

• After development of multi-PSI V2, we promote 
the design (© @) and trial-fabrication of PIM 
experimental models ((3)©). 

• At present, we are completing development of 
prototype hardware consisting of 3 large scale 
PIM modules and 2 small scale experimental 
PIM modules ((3) @). These PIM modules are 
designed to be equally suited to the KL1 
machine for inference and knowledge base 
management, and to be able to be installed all 
programs written by KL1. This is in spite of 
their using different architecture. 

The VPIM system is a KLl-b language processing 
system which gives a common base for PIM 
firmware for KLl-b developed on conventional 
computers. 



R&D on KBM continued until the end of the 
intermediate stage. An experimental relational 
data base machine (Delta) with 4 relational 
algebraic engines was trial-fabricated in the initial 
stage ((D ©). During the intermediate stage, a 

deductive data base simulator was developed to use 
PSIs with an accelerator for comparison and 
searching. An experimental system was also 
developed with multiple-multiple name spaces, by 
using CHI. Lastly, a knowledge base hardware 
simulator with unification engines and multi-port 
page memory was developed in this stage (© ®). 
We developed DB/KB management software, called 
Kappa, on concurrent basic software themes. At the 
beginning of the final stage, we thought that 
adaptability of PIM with Kappa for the various 
description forms for the knowledge base was more 
important than effectivity of KBM with special 
mechanism for the specific KB forms. In other 
words, we thought that deductive object-oriented 
DB technologies was not yet matured to design 
KBM as a part of the prototype system. 

2.4 Overview of R&D Results of 
Software Systems 

The R&D of software systems was carried out by a 
number of subjects listed below in each stage. 

© Initial stage 

• Basic software 

® 5G Kernel Languages 

® Problem solving and inference software 
module 

© Knowledge base management software 
module 

@ Intelligent interface software module 

© Intelligent programming software module 

© SIM software of pilot model for development 
support 

(D Basic software system in the intermediate stage 

©-© (as in the initial stage) 

© Experimental application system for basic 
software module 

© Final stage 

• Basic software system 

@ Inference Control module 

© KB management module 

• Knowledge programming software 

© Problem solving and programming module 

@ Natural language interface module 

© Knowledge construction and utilization 
module 

© Advanced problem solving inference method 
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(D Experimental parallel application system 

To make the R&D results easy to understand, I will 
separate the results for languages, basic software, 
knowledge programming and application software. 

2.4.1 R&D results of Fifth Generation 
Computer languages 

As the first step in 5G language development, we 
designed sequential logic programming languages 
KLO and ESP (Extended Self-contained Prolog) and 
developed these language processors (© ®). KLO, 
designed for the PSI hardware system, is based on 
Prolog. ESP has extended modular programming 
functions to KLO and is designed to describe large 
scale software such as SIMPOS and application 
systems. 

As a result of research on parallel logic 
programming language, Guarded Horn Clauses, or 
GHC, was proposed as the basic specification for 
KL1 (Kernel Language Version 1) (® @). KL1 was, 
then, designed by adding various functions to KL1 
such as a macro description (CD®). KL1 consists of a 
machine level language (KLl-b (base) ), a core 
language (KLl-c) for writing parallel software and 
pragma (KLl-p) to describe the division of parallel 
processes. Parallel inference machines, multi-PSI 
and PIM, are based on KLl-b. Various parallel 
software, including PIMOS, is written in KLl-c and 
KLl-p. 

A’um is an object oriented language. The results 
of developing the A’um experimental language 
processor reflect improvements in KL1 (®®, (|)®). 

To research higher level languages, several 
languages were developed to aid description of 
specific research fields. CIL (Complex 
Indeterminate Language) is the extended language 
of Prolog that describes meanings and situations for 
natural language processing (® @, © @). CRL 
(Complex Record Language) was developed as a 
knowledge representation language to be used 
internally for deductive databases on nested 
relational DB software ((D ©). CAL (Contrainte 
Avec Logique) is a sequential constraint logic 
language for constraint programming ((D®). 

Mandala was proposed as a knowledge 
representation language for parallel processing, but 
was not adopted because it lacks a parallel 
processing environment and we had enough 
experience with it in the initial stage (CD©). 

Quixote is designed as a knowledge 
representation language and knowledge-base 
language for parallel processing based on the 
results of evaluation by CIL and CRL. Quixote is 
also a deductive object-oriented database language 
and play the key role in KBMS. A language 
processor is currently being developed for Quixote. 
GDCC(Guarded Definite Clause with Constraints) 



<"* Initial Stage *"><C Intermediate Stage ><C Final Stage 

119821 '83 1 '84119851 '86 1 '871 '88119891 '90 l..:9lJ .V22J 



Machine level . 
Languages 4 

1 Sequential 




<cT^;lGHgK TklQ 

Machine level, 1 FGHCf 1-( 

Languages ♦ 

1 Parallel l| 

High level , | Mandala] 

Languages I 



KLl-c KLl-p, 



GDCC |. 

| Quixote) 



Figure 2-4 Transition of R&D of 5G Languages 



is a parallel constraint logic language that processes 
CAL results. 



2.4.2 R&D Results of Basic Software (OS) 



In the initial stage, we developed a preliminary 
programming and operating system for PSI, called 
SIMPOS, using ESP ((D © ©). We continued to 
improve SIMPOS by adding functions 
corresponding to evaluation results. We also took 
into account the opinions of inside users who had 
developed software for the PSI machine using 
SIMPOS (CD®®). 



Since no precedent parallel OS which is suited for 
our aims had been developed anywhere in the world, 
we started to study parallel OS using our 
experiences of SIMPOS development in the initial 
stage. A small experimental PIMOS was developed 
on the multi-PSI VI system in the first half of the 
intermediate stage (© ®). Then, the first version of 
PIMOS was developed on the multi-PSI V2 system, 
and was used by KL1 users (© ®). PIMOS 

continued to be improved by the addition of 
functions such as remote access, file access and 
debugging support (®@). 



The Program Development Support System was 
also developed by the end of the intermediate stage 

«E>®). 
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Paragraph was developed as a parallel 
programming support system for improving 
concurrency and load distribution by the indication 
results of parallel processing (®@). 

In regard to DB/KB management software, 
Kaiser was developed as a experimental relational 
DB management software in the initial stage 
(® ©). Then, Kappa- I and Kappa- II were 

developed to provide the construction functions 
required to build a large scale DB/KB that could be 
used for natural language processing, theorem 
proving and various expert systems (© ©). Kappa- 
I and Kappa- II , based on nested relational model, 
are aimed at the database engine of deductive 
object-oriented DBMS . 

Recently, a parallel version of Kappa , Kappa-P, 
is being developed. Kappa-P can manage 
distributed data bases stored on distributed disks in 
PIM. ((3) ®) Kappa-P and Quixote constitute the 
KBMS. 

2.4.3 R&D Results of Problem Solving and 
Programming Technologies 

Throughout this project, from the viewpoint of 
similarity mathematical theorem proving and 
program specification, we have been investigating 
proving technologies. The CAP (Computer Aided 
Proof) system was experimentally developed in the 
initial stage ((2) ©). TRS (Term Rewriting System) 
and Metis were also developed to support specific 
mathematical reasoning, that is, the inference 
associated equals sign (©©). 

An experimental program for program verification 
and composition, Argus, was developed by the end of 
the intermediate stage (® © and ® ©). These 
research themes concentrated on R&D into the 
MGTP theorem prover in the final stage(®©). 

Meta-programming technologies, partial 
evaluation technologies and the learning 
mechanism were investigated as basic research on 
advanced problem solving and the inference method 

((D®, (D®, (D(D). 

2.4.4 R&D Results on Natural Language 
Processing Technologies 

Natural language processing tools such as BUP 
(Bottom-Up Parser) and a miniature electronic 
dictionary were experimentally developed in the 
initial stage (© @). These tools were extended, 

improved and arranged into LTB (Language Tool 
Box). LTB is a library of Japanese processing 
software modules such as LAX (Lexical Analyzer), 
SAX (Syntactic Analyzer), a text generator and 
language data bases (®@, d)@). 

An experimental discourse understanding 
system, DUALS, was implemented to investigate 



context processing and semantic analysis using 
these language processing tools (® ©,© @). An 
experimental argument system, called Dulcinia, is 
being implemented in the final stage (® ®). 

2.4.5 R&D Results on Knowledge Utilization 
Technologies and Experimental 
Application Systems 

In the intermediate stage we implemented 
experimental knowledge utilization tools such as 
APRICOT, based on hypothetical reasoning 
technology, and Qupras, based on qualitative 
reasoning technology (© ©). At present, we are 
investigating such inference mechanisms for expert 
systems as assumption based reasoning and case 
based reasoning, and implementing these as 
knowledge utilization tools to be applied to the 
experimental application system (©©). 

As an application system, we developed, in 
Prolog, an experimental CAD system for logic 
circuit design support and wiring support in the 
initial stage. We also developed several 
experimental expert systems such as a CAD system 
for layout and logic circuit design, a troubleshooting 
system, a plant control system and a go-playing 
system written in ESP (©©, etc.). 

Small to medium parallel programs written in 
KL1 were also developed to test and evaluate 
parallel systems by the end of the intermediate 
stage. These were improved for application to PIM 
in the final stage. These programs are PAX (a 
parallel semantics analyzer), Pentomino solver, 
shortest path solver and Tsume-go. 

We developed several experimental parallel 
systems, implemented using KL1 in the final stage, 
such as LSI-CAD system (for logical simulation, 
wire routing, block layout, logical circuit design), 
genetic information processing system, legal 
inference system based on case based reasoning, 
expert systems for troubleshooting, plant control 
and go-playing (3g). 

Some of these experimental systems were 
developed from other earlier sequential systems in 
the intermediate stage while others are new 
application fields that started in the final stage. 

2.5 Infrastructure of the FGCS 
Project 

As explained in 2.2, the main language used for 
software implementation in the initial stage was 
Prolog. In the intermediate stage, ESP was mainly 
used, and in the final stage KL1 was the principle 
language. 

Therefore, we used a Prolog processing system on 
a conventional computer and terminals in the 
initial stage. SIMPOS on PSI ( I and II ) was used 
as the workbench for sequential programming in 




15 



<^~ Initial Stage Int ermediate Stage ZxC Final Stage 

119821 '831 '84119851 '86l '87l '88119891 '90 1 '91 ILsf] 



Machines 

for 

Software 

Development, 

Simulation 

& 

Communication 



mMOS,etc. /^rj 




_____ : : ^rSPMathines 

Seneraf Purpose Computers 



Unix Machines 



Networks 




nternatinal Connection ^l a _L e ased_line 
(Via Public network ) 



Domestic network 
(Via Public network) 

LAN 



✓'Domestic network 
(Via Leased lines) 



Figure 2-6 Infrastructure for R&D 



| President |-j Executive 
— ’ ' -Bkegfld 



Managing 

Director 



Board of j— ( Auditors 
Directors 



Steering 

C ommit tee 



Management 

-Committee 



Technology 
^1 - Commi ttee 



General Affairs Office | 


' General " 
Manager 


Administration 

Department 




International 

Relations 

Department 


Research Center | 


. Director 
of 

Research 


Research 

Planning 

Department 


r 


Research 


Deputy 

Directors 


Department 
and Research 
Laboratories 






1 



Project Promotion Committee Working Groups 



the intermediate stage. We are using PSI ( n and 
1H ) as a workbench and remote terminals to parallel 
machines (multi-PSIs and PIMs) for parallel 
programming in the final stage. We have also used 
conventional machines for simulation to design PIM 
and a communication (E-mail, etc.) system. 

In regard to the computer network system, LAN 
has been used as the in-house system, and LAN has 
been connected to domestic and international 
networks via gateway systems. 

3 Promoting Organization of 
the FGCS Project 

ICOT was established in 1982 as a non-profit core 
organization for promoting this project and it began 
R&D work on fifth generation computers in June 
1982, under the auspices of MITI. 

Establishment of ICOT was decided by considering 
the following necessity and effectiveness of a 
centralized core research center for promoting 
originative R&D, 

• R&D themes should be directed and selected by 
powerful leadership, in consideration of hardware 
and software integration, based on a unified 
framework of fifth generation computers, 
throughout the ten-year project period. 

•It was necessary to develop and nurture 
researchers working together because of the lack of 
researchers in this research field. 

• A core center was needed to exchange information 
and to collaborate with other organizations and 
outside researchers. 

ICOT consists of a general affairs office and a 
research center (Figure 3-1) . 

The organization of the ICOT research center was 
changed flexibly depending on the progress being 
made. In the initial stage, the research center 
consisted of a research planning department and 
three research laboratories. The number of 



Figure 3-1 ICOT Organization 



laboratories was increased to five at the beginning 
of the intermediate stage. These laboratories 
became one research department and seven 
laboratories in 1990. 
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Figure 3-2 Transition of ICOT research center 
organization 



The number of researchers at the ICOT research 
center has increased yearly, from 40 in 1982 to 100 
at the end of the intermediate stage. 

All researchers at the ICOT research center have 
been transferred from national research centers, 
public organizations, and computer vendors, and the 
like. To encourage young creative researchers and 
promote originative R&D, the age of dispatched 
researchers is limited to 35 years old. Because all 
researchers are normally dispatched to the ICOT 
research center for three to four years, ICOT had to 
receive and nurture newly transferred researchers. 
We must make considerable effort to continue to 
consistently lead R&D in the fifth generation 
computer field despite researcher rotation. This 
rotation has meant that we were able to maintain a 
staff of researchers in their 30’s, and also could 
easily change the structure of organization in the 
ICOT research center. 

In total, 184 researchers have been transferred to 
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the ICOT research center with an average transfer 
period of 3 years and eight months (including 
around half of the dispatched researchers who are 
presently at ICOT). 

The number of organizations which dispatched 
researchers to ICOT also increased, from 11 to 19. 
This increase in participating organizations was 
caused by an expanding scheme of the supporting 
companies, around 30 companies, to dispatch 
researchers to ICOT midway through the 
intermediate stage. 

The themes each laboratory was responsible for 
changed occasionally depending on the progress 
being made. 

Figure 3-3 shows the present assignment of 
research themes to each research laboratory. 
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Figure 3-3 ICOT research center organization 

Every year we invited several visiting 
researchers from abroad for several weeks at ICOT's 
expense to discuss and to exchange opinion on 
specific research themes with ICOT researchers. Up 
to the present, we have invited 74 researchers from 
12 countries in this program. 

We also received six long-term (about one year 
each) visiting researchers from foreign 
governmental organizations based on 
memorandums with the National Science 
Foundation (NSF) in the United States, the 
Institute National de Recherche en Informatique et 
Automatiqeu (INRIA) in France, and the 
Department of Trade and Industry (DTI) in the 
United Kingdom (Figures 3-2 and 3-4). 

Figure 3-4 shows the overall structure for 
promoting this project. The entire cost for the R&D 
activities of this project is supported by MITI based 
on the entrust contract between MITI and ICOT. 
Yearly and at the beginning of each stage we 
negotiate our R&D plan with MITI. MITI receives 
advice of this R&D plan and evaluations of R&D 
results and ICOT research activities from the FGCS 
project advisory committee. 

ICOT executes the core part of R&D and has 
contracts with eight computer companies for 




Figure 3-4 Structure for promoting FGCS project 



experimental production of hardware and 
developmental software. Consequently, ICOT can 
handle all R&D activities, including the 
developmental work of computer companies towards 
the goals of this project. 

ICOT has set up committee and working groups to 
discuss and to exchange opinions on overall plans 
results and specific research themes with 
researchers and research leaders from universities 
and other research institutes. Of course, 
construction and the themes of working groups are 
changed depending on research progress. The 
number of people in a working group is around 10 to 
20 members, so the total number in the committee 
and working groups is about 150 to 250 each year. 

Another program for information exchange and 
collaborative research activities and diffusion of 
research results will be described in the following 
chapter. 

4 Distribution of R&D Results 
and International Exchange 
Activities 

Because this project is a national project in which 
world- wide scientific contribution is very important, 
we have made every effort to include our R&D ideas, 
processes and project results when presenting ICOT 
activities. We, also, collaborate with outside 
researchers and other research organizations. 

We believe these efforts have contributed to 
progress in parallel and knowledge processing 
computer technologies. I feel that the R&D efforts 
in these fields have increased because of the 
stimulative effect of this project. We hope that R&D 
efforts will continue to increase through 
distribution of this projects R&D results. I believe 
that many outside researchers have also made 
significant contributions to this project through 
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their discussions and information exchanges with 
ICOT researchers. 
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We could, for example, produce GHC, a core 
language of the parallel system, by discussion with 
researchers working on Parlog and Concurrent 
Prolog. We could, also, improve the performance of 
the PSI system by introducing the WAM instruction 
set proposed by Professor Warren. 

We have several programs for distributing the 
R&D results of this project, to exchange information 
and to collaborate with researchers and 
organizations. 

(D One important way to present R&D activities 
and results is publication and distribution of 
ICOT journals and technical papers. We have 
published and distributed quarterly journals, 
which contain introductions of ICOT activities, 
and technical papers to more than 600 locations 
in 35 countries. 

We have periodically published and sent more 
than 1800 technical papers to around 30 
overseas locations. We have sent TRs 
(Technical Reports) and TMs (Technical 
Memos) on request to foreign addresses. These 
technical papers consist of more than 700 TRs 
and 1100 TMs published since the beginning of 
this project up to January 1992. A third of 
these technical papers are written in English. 

(D In the second program ICOT researchers 
discuss research matters and exchange 
information with outside researchers. 

• ICOT researchers have made more than 450 
presentations at international conferences 
and workshops, and at around 1800 domestic 
conferences and workshops. They have 
visited many foreign research organizations 
to discuss specific research themes and to 
explain ICOT activities. 



• Every year, we have welcomed around 150 to 
300 foreign researchers and specialists in 
other fields to exchange information with 
them and explain ICOT activities to them. 

• As already described in the previous chapter, 
we have so far invited 74 active researchers 
from specific technical fields related to FGCS 
technologies. We have also received six long- 
term visiting researchers dispatched from 
foreign governmental organization based on 
agreement. These visiting researchers 
conducted research at ICOT and published the 
results of that research. 

(D We sponsored the following symposiums and 
workshops to disseminate and exchange 
information on the R&D results and on ICOT 
activities. 

• We hosted the International Conference on 
FGCS’84 in November 1984. Around 1,100 
persons participated and the R&D results of 
the initial stage were presented. This 
followed the International Conference on 
FGCS’81, in which the FGCS project plan was 
presented. We also hosted the International 
Conference on FGCS’88 in November 1988. 
1,600 persons participated in this 
symposium, and we presented the R&D 
results of the intermediate stage. 

• We have held 

7 Japan-Sweden (or Japan-Sweden-Italy) 
workshops since 1983 (co-sponsored with 
institute or universities in Sweden and Italy), 
4 Japan-France AI symposiums since 1986, 
(co-sponsored with INRIA of France), 

4 Japan-U.S. A I symposiums since 1987 (co- 
sponsored with NSF of U.S.A.), and 
2 Japan-U.K. workshops since 1989 (co- 
sponsored with DTI of U.K.). 

Participating researchers have become to 
known each other well through presentations 
and discussions during these symposiums and 
workshops. 

• We have also hosted domestic symposiums on 
this project and logic programming 
conferences every year. 

(D Because the entire R&D cost of this project has 
been provided by the government, such 
intellectual property rights (IPR) as patents, 
which are produced in this project, belong to the 
Japanese government. These IPR are managed 
by AIST (Agency of Industrial Science and 
Technology). Any company wishing to produce 
commercial products that use any of these IPR 
must get permission to use them from AIST. 
For example, PSI and SIMPOS have already 
been commercialized by companies licensed by 
AIST. The framework for managing IPR must 
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impartially utilize IPR acquired through this 
project. That is, impartial permission to 
domestic and foreign companies, and among 
participating companies or others is possible 
because of AIST. 

© Software tools developed in this project that are 
not yet managed as IPR by AIST can be used by 
other organizations for non-commercial aims. 
These software tools are distributed by ICOT 
according to the research tools permission 
procedure. We, now, have more than 20 
software tools, such as PIMOS, PDSS, Kappa-H, 
the A’um system, LTB, the CAP system, the cu- 
prolog system and the TRS generator. 

In other cases, we make the source codes of 
some programs public by printing them in 
technical papers. 

© On specific research themes in the logic 
programming field, we have collaborated with 
organizations such as Argonne National 
Laboratory (ANL), National Institute of Health 
(NIH), Lawrence Berkeley Laboratory (LBL), 
Swedish Institute of Computer Science (SICS) 
and Australia National University (ANU). 

5 Forecast of Some Aspects of 
5G Machines 

LSI technologies have advance in accordance with 
past trends. Roughly speaking, the memory 
capacity and the number of gates of a single chip 
quadruple every three years. The number of boards 
for the CPU of an inference machine was more than 
ten for PSI- 1 , but only three for PSI- II and single 
board for PIM. 

The number of boards for 80M bytes memory was 
16 for PSI- 1 , but only four for PSI- II and a single 
for PIM (m). 

Figure 5-1 shows the anticipated trend in board 
numbers for one PE (processor element: CPU and 
memory) and cost for one PE based on the actual 
value of inference machines developed by this 
project. 

The trend shows that, by the year 2000, around 
ten PEs will fit on one board, around 100 PEs will fit 
in one desk side cabinet, and 500 to a 1,000 PEs will 
fit in a large cabinet. This trend also shows that the 
cost of one PE will halve every three years. 

Figure 5-2 shows the performance trends of 5G 
machines based on the actual performance of 
inference machines developed by this project. 

The sequential inference processing performance 
for one PE quadrupled every three years. The 
improvement in parallel inference processing 
performance for one PE was not as large as it was 
for sequential processing, because PIM performance 
is estimated at around two and one half times that 
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Figure 5-1 Size and cost trends of 5G machines 

of multi-PSI. Furthermore, Figure 5-2 shows the 
performance of one board for both sequential and 
parallel processing, and the performance of a 
conventional micro-processor with CISC and RISC 
technology. In this figure, future improvements in 
the performance of one PE are estimated to be 
rather lower than a linear extension of past values 
would indicate because of the uncertainty of 
whether future technology will be able to elicit such 
performance improvements. Performance for one 
board is estimated at about 20 MLEPS, which is 100 
times faster than PIM. Thus, a parallel machine 
with a large cabinet size could have 1 GLIPS. These 
parallel systems will have the processing speeds 
needed for various knowledge processing 
applications in the near future. 




Figure 5-2 Performance trends of 5G machines 

Several parallel applications in this project, such as 
CAD, theorem provers, genetic information 
processing, natural language processing, and legal 
reasoning are described in Chapter 2. These 
applications are distributed in various fields and 
aim at cultivating new parallel processing 
application fields. 

We believe that parallel machine applications 
will be extended to various areas in industry and 
society, because parallel technology will become 





common for computers in the near future. Parallel 
application fields will expand gradually according 
to function expansion by the use of advanced 
parallel processing and knowledge processing 
technologies. 



6 Final Remarks 

I believe that we have shown the basic framework 
of the fifth generation computer based on logic 
programming to be more than mere hypothesis. By 
the end of the initial stage, we had shown the fifth 
generation computer to be viable and efficient 
through the development of PSI, SIMPOS and 
various experimental software systems written in 
ESP and Prolog. 

I believe that by the end of the intermediate 
stage, we had shown the possibility of realizing the 
fifth generation computer through the development 
of a parallel logic programming software 
environment which consisted of multi-PSl and 
PIMOS. 

And I hope you can see the possibility of an era of 
parallel processing arriving in the near future by 
looking at the prototype system and the R&D 
results of the FGCS Project. 
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Abstract 

The Fifth Generation Computer Project was launched 
in 1982, with the aim of developing parallel comput- 
ers dedicated to knowledge information processing. It 
is commonly believed to be very difficult to parallelize 
knowledge processing based on symbolic computation. 
We conjectured that logic programming technology would 
solve this difficulty. 

We conducted our project while stressing two seem- 
ingly different aspects of logic programming: one was 
establishment of a new information technology, and the 
other was pursuit of basic AI and software engineering 
research. 

In the former, we developed a concurrent logic pro- 
gramming language, GHC, and its extension for practical 
parallel programming, KL1. The invention of GHC/KL1 
enabled us to conduct parallel research on the develop- 
ment of software technology and parallel hardware ded- 
icated to the new language. 

We also developed several constraint logic program- 
ming languages which are very promising as high level 
languages for AI applications. Though most of them are 
based on sequential Prolog technology, we are now in- 
tegrating constraint logic programming and concurrent 
logic programming and developing an integrated lan- 
guage, GDCC. 

In the latter, we investigated many fundamental AI 
and software engineering problems including hypotheti- 
cal reasoning, analogical inference, knowledge represen- 
tation, theorem proving, partial evaluation and program 
transformation. 

As a result, we succeeded in showing that logic pro- 
gramming provides a very firm foundation for many as- 
pects of information processing: from advanced software 
technology for AI and software engineering, through sys- 
tem programming and parallel programming, to parallel 
architecture. 

The research activities are continuing and latest as 
well as earlier results strongly indicate the truth of our 
conjecture and also the fact that our approach is appro- 
priate. 



1 Introduction 

In the Fifth Generation Computer Project, two main 
research targets were pursued: knowledge information 
processing and parallel processing. Logic programming 
was adopted as a key technology for achieving both tar- 
gets simultaneously. At the beginning of the project, we 
adopted Prolog as our vehicle to promote the entire re- 
search of the project. Since there were no systematic 
research attempts based on Prolog before our project, 
there were many things to do, including the development 
of a suitable workstation for the research, experimental 
studies for developing a knowledge-based system in Pro- 
log and investigation into possible parallel architecture 
for the language. We rapidly succeeded in promoting 
research in many directions. 

From this research, three achievements are worth not- 
ing. The first is the development of our own worksta- 
tion dedicated to ESP, Extended Self-contained Prolog. 
We developed an operating system for the workstation 
completely in ESP [Chikayama 88]. The second is the 
application of partial evaluation to meta programming. 
This enabled us to develop a compiler for a new program- 
ming language by simply describing an interpreter of the 
language and then partially evaluating it. We applied 
this technique to derive a bottom-up parser for context 
free grammar given a bottom up interpreter for them. In 
other words, partial evaluation made meta programming 
useful in real applications. The third achievement was 
the development of constraint logic programming lan- 
guages. We developed two constraint logic programming 
languages: CIL and CAL. CIL is for natural language 
processing and is based on the incomplete data struc- 
ture for representing “Complex Indeterminates” in sit- 
uation theory. It has the capability to represent struc- 
tured data like Minsky’s frame and any relationship be- 
tween slots’ values can be expressed using constraints. 
CIL was used to develop a natural language understand- 
ing system called DUALS. Another constraint logic pro- 
gramming language, CAL, is for non-linear equations. 
Its inference is done using the Buchberger algorithm for 
computing the Grobner Basis which is a variant of the 
Knuth-Bendix completion algorithm for a term rewriting 
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system. 

We encountered one serious problem inherent to Pro- 
log: that was the lack of concurrency in the fundamental 
framework of Prolog. We recognized the importance of 
concurrency in developing parallel processing technolo- 
gies, and we began searching for alternative logic pro- 
gramming languages with the notion of concurrency. 

We noticed the work by Keith Clark and Steve Gregory 
on Relational Language [ClarkGregory 81] and Ehud 
Shapiro on Concurrent Prolog [Shapiro 83]. These lan- 
guages have a common feature of committed choice 
nondeterminism to introduce concurrency. We devoted 
our efforts to investigating these languages carefully 
and Ueda finally designed a new committed choice 
logic programming language called GHC [Ueda 86a] 
[UedaChikayama 90], which has simpler syntax than the 
above two languages and still have similar expressiveness. 
We recognized the importance of GHC and adopted it as 
the core of our kernel language, KL1, in this project. The 
introduction of KL1 made it possible to divide the entire 
research project into two parts: the development of par- 
allel hardware dedicated to KL1 and the development of 
software technology for the language. In this respect, the 
invention of GHC is the most important achievement for 
the success of the Fifth Generation Computer Systems 
project. 

Besides these language oriented researches, we per- 
formed many fundamental researches in the field of arti- 
ficial intelligence and software engineering based on logic 
and logic programming. They include researches on non- 
monotonic reasoning, hypothetical reasoning, abduction, 
induction, knowledge representation, theorem proving, 
partial evaluation and program transformation. We ex- 
pected that these researches would become important 
application fields for our parallel machines by the affinity 
of these problems to logic programming and logic based 
parallel processing. This is now happening. 

In this article, we first describe our research efforts 
in concurrent logic programming and in constraint logic 
programming. Then, we discuss our recent research ac- 
tivities in the field of software engineering and artificial 
intelligence. Finally, we conclude the paper by stating 
the dirction of future research. 

2 Concurrent Logic Program- 
ming 

In this section, we pick up two important topics in 
concurrent logic programming research in the project. 
One is the design principles of our concurrent logic 
programming language Flat GHC (FGHC) [Ueda 86a] 
[UedaChikayama 90], on which the aspects of KL1 as 
a concurrent language is based. The other is search 
paradigms in FGHC. As discussed later, one drawback 
of FGHC, viewing as a logic programming language, is 



the lack of search capability inherent in Prolog. Since 
the capability is related to the notion of completeness in 
logic programming, recovery of the ability is essential. 

2.1 Design Principles of FGHC 

The most important feature of FGHC is that there is 
only one syntactic extension to Prolog, called the com- 
mitment operator and represented by a vertical bar “|”. 
A commitment operator divides an entire clause into two 
parts called the guard part (the left-hand side of the bar) 
and the body part (the right-hand side). The guard of a 
clause has two important roles: one is to specify a condi- 
tion for the clause to be selected for the succeeding com- 
putation, and the other is to specify the synchronization 
condition. The general rule of synchronization in FGHC 
is expressed as dataflow synchronization. This means 
that computation is suspended until sufficient data for 
the computation arrives. In the case of FGHC, guard 
computation is suspended until the caller is sufficiently 
instantiated to judge the guard condition. For- exam- 
ple, consider how a ticket vending machine works. After 
receiving money, it has to wait until the user pushes a 
button for the destination. This waiting is described as a 
clause such that “if the user pushed the 160-yen button, 
then issue a 160-yen ticket”. 

The important thing is that dataflow synchronization 
can be realized by a simple rule governing head unifica- 
tion which occurs when a goal is executed and a corre- 
sponding FGHC clause is called: the information flow of 
head unification must be one way, from the caller to the 
callee. For example, consider a predicate representing 
service at a front desk. Two clauses define the predi- 
cate: one is for during the day, when more customers are 
expected, and another is for after-hours, when no more 
customers are expected. The clauses have such defini- 
tions as: 

serve( [First I Rest]) :- <extra-condition> | 
do_service(First) , serve(Rest). 

serve(G) :- true I true. 

Besides the serve process, there should be another pro- 
cess queue which makes a waiting queue for service. The 
top level goal looks like: 

?- queue(Xs) , serve(Xs) . 

where is a prompt to the user at the terminal. Note 
that the execution of this goal generates two processes, 
queue and serve, which share a variable Xs. This shared 
variable acts as a channel for data transfer from one pro- 
cess to the other. In the above example, we assume that 
the queue process instantiates Xs and the serve pro- 
cess reads the value. In other words, queue acts as a 
generator of the value of Xs and serve acts as the con- 
sumer. The process queue instantiates Xs either to a 
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list of servees represented by [<first-servee>, <second- 
servee>,...] or to an empty list []. Before the instanti- 
ation, the value of Xs remains undefined. 

Suppose Xs is undefined. Then, the head unification 
invoked by the goal serve (Xs) suspends because the 
equations Xs = [First | Rest] and Xs = [] cannot be 
solved without instantiating Xs. But such instantiation 
violates the rule of one-way unification. Note that the 
term [First [ Rest] in the head of serve means that 
the clause expects a non-empty list to be given as the 
value of the argument. Similarly, the term [] expects 
an empty list to be given. Now, it is clear that the uni- 
directionality of information flow realizes dataflow syn- 
chronization. 

This principle is very important in two aspects: one is 
that the language provides a natural tool for expressing 
concurrency, and the other is that the synchronization 
mechanism is simple enough to realize very efficient par- 
allel implementation. 

2.2 Search Paradigms in FGHC 

There is one serious drawback to FGHC because of the 
very nature of committed choice; that is, it no longer 
has an automatic search capability, which is one of the 
most important features of Prolog. Prolog achieves its 
search capability by means of automatic backtracking. 
However, since committed choice uniquely determines a 
clause for succeeding computation of a goal, there is no 
way of searching for alternative branches other than the 
branch selected. The search capability is related to the 
notion of completeness of the logic programming compu- 
tation procedure and the lack of the capability is very 
serious in that respect. 

One could imagine a seemingly trivial way of real- 
izing search capability by means of or-parallel search: 
that is, to copy the current computational environment 
which provides the binding information of all variables 
that have appeared so far and to continue computations 
for each alternative case in parallel. But this does not 
work because copying non-ground terms is impossible in 
FGHC. The reason why it is impossible is that FGHC 
cannot guarantee when actual binding will occur and 
there may be a moment when a variable observed at 
some processor remains unchanged even after some goal 
has instantiated it at a different processor. 

One might ask why we did not adopt a Prolog-like 
language as our kernel language for parallel computa- 
tion. There are two main reasons. One is that, as stated 
above, Prolog does not have enough expressiveness for 
concurrency, which we see as a key feature not only for 
expressing concurrent algorithms but also for providing 
a framework for the control of physical parallelism. The 
other is that the execution mechanism of Prolog- like lan- 
guages with a search capability seemed too complicated 
to develop efficient parallel implementations. 



We tried to recover the search capability by devising 
programming techniques while keeping the programming 
language as simple as possible. We succeeded in invent- 
ing several programming methods for computing all so- 
lutions of a problem which effectively achieve the com- 
pleteness of logic programming. Three of them are listed 
as follows: 

(1) Continuation-based method [Ueda 86b] 

(2) Layered stream method [OkumuraMatsumoto 87] 

(3) Query compilation method [Furukawa 92] 

In this paper, we pick up (1) and (3), which are 
complementary to each other. The continuation-based 
method is suitable for the efficient processing of rather 
algorithmic problems. An example is to compute all ways 
of partitioning a given list into two sublists by using 
append. This method mimics the computation of 0R- 
parallel Prolog using AND-parallelism of FGHC. AND- 
serial computation in Prolog is translated to continu- 
ation processing which remembers continuation points 
in a stack. The intermediate results of computation are 
passed from the preceding goals to the next goals through 
the continuation stack kept as one of the arguments of 
the FGHC goals. This method requires input/output 
mode analysis before translating a Prolog program into 
FGHC. This requirement makes the method impracti- 
cal for database applications because there are too many 
possible input-output modes for each predicate. 

The query compilation method solves this problem. 
This method was first introduced by Fuchi [Fuchi 90] 
when he developed a bottom-up theorem prover in KL1. 
In his coding technique, the multiple binding problem is 
avoided by reversing the role of the caller and the callee in 
straightforward implementation of database query eval- 
uation. Instead of trying to find a record (represented 
by a clause) which matches a given query pattern repre- 
sented by a goal, his method represents each query com- 
ponent with a compiled clause, represents a databasae 
•with a data structure passed around by goals, and tries 
to find a query component clause which matches a goal 
representing a record and recurses the process for all po- 
tentially applicable records in the database 1 . Since ev- 
ery record is a ground term, there is no variable in the 
caller. Variable instantiation occurs when query com- 
ponent clauses are searched and an appropriate clause 
representing a query component is found to match a 
currently processed record. Note that, as a result of re- 
versing the representation of queries and databases from 
straightforward representation, the information flow is 
now from the caller (database) to the callee (a query 
component). This inversion of information flow avoids 
deadlock in query processing. Another important trick 
is that each time a query clause is called, a fresh vari- 
able is created for each variable in the query component. 
This mechanism is used for making a new environment 

1 We need an auxiliary query clause which matches every record 
after failing to match the record to all the real query clauses. 
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for each OR-parallel computation branch. These tricks 
make it possible to use KLl variables to represent object 
level variables in database queries and, therefore, we can 
avoid different compilation of the entire database and 
queries for each input /output mode of queries. 

The new coding method stated above is very gen- 
eral and there are many applications which can be pro- 
grammed in this way. The only limitation of this ap- 
proach is that the database must be more instantiated 
than queries. In bottom-up theorem proving, this re- 
quirement is referred to as the range-restrictedness of 
each axiom. Range-restrictedness means that, after suc- 
cessfully finding ground model elements satisfying the 
antecedent of an axiom, the new model element appear- 
ing as the consequent of the axiom must be ground. 

This restriction seems very strong. Indeed, there are 
problems in the theorem proving area which do not 
satisfy the condition. We need a top-down theorem 
prover for such problems. However, many real life prob- 
lems satisfy the range-restrictedness because they al- 
most always have finite concrete models. Such prob- 
lems include VLSI-CAD, circuit diagnosis, planning, and 
scheduling. We are developing a parallel bottom-up 
theorem prover called MGTP (Model Generation The- 
orem Prover) [FujitaHasegawa 91] based on SATCHMO 
developed by Manthey and Bry [MantheyBry 88]. We 
are investigating new applications to utilize the theorem 
prover. We will give an example of computing abduction 
using MGTP in Section 5. 

3 Constraint Logic Program- 
ming 

We began our constraint logic programming research 
almost at the beginning of our project, in relation to 
the research on natural language processing. Mukai 
[MukaiYasukawa 85] developed a language called CIL 
(Complex Indeterminates Language) for the purpose of 
developing a computational model of situation seman- 
tics. A complex indeterminate is a data structure allow- 
ing partially specified terms with indefinite arity. During 
the design phase of the language, he encountered the idea 
of freeze in Prolog II by Colmerauer [Colmerauer 86]. He 
adopted freeze as a proper control structure for our CIL 
language. 

From the viewpoint of constraint satisfaction, CIL only 
has a passive way of solving constraint, which means 
that there is no active computation for solving con- 
straints such as constraint propagation or solving si- 
multaneous equations. Later, we began our research on 
constraint logic programming involving active constraint 
solving. The language we developed is called CAL. It 
deals with non-linear equations as expressions to spec- 
ify constraints. Three events triggered the research: one 
was our preceding efforts on developing a term rewrit- 



ing system called METIS for a theorem prover of linear 
algebra [OhsugaSakai 91]. Another event was our en- 
counter with Buchberger’s algorithm for computing the 
Grobner Basis for solving non-linear equations. Since the 
algorithm is a variant of the Knuth-Bendix completion 
algorithm for a term rewriting system, we were able to 
develop the system easily from our experience of devel- 
oping METIS. The third event was the development of 
the CLP(X) theory by Jaffar and Lassez which provides 
a framework for constraint logic programming languages 
[JaffarLassez 86]. 

There is further remarkable research on constraint 
logic programming in the field of general symbol pro- 
cessing [Tsuda 92]. Tsuda developed a language called 
cu-Prolog. In cu-Prolog, constraints are solved by means 
of program transformation techniques called unfold/fold 
transformation (these will be discussed in more detail 
later in this paper, as an optimization technique in re- 
lation to software engineering). The unfold/fold pro- 
gram transformation is used here as a basic operation 
for solving combinatorial constraints among terms. Each 
time the transformation is performed, the program is 
modified to a syntactically less constrained program. 
Note that this basic operation is similar to term rewrit- 
ing, a basic operation in CAL. Both of these operations 
try to rewrite programs to get certain canonical forms. 
The idea of cu-Prolog was introduced by Hasida during 
his work on dependency propagation and dynamical pro- 
gramming [Hasida 92]. They succeeded in showing that 
context-free parsing, which is as efficient as chart parsing , 
emerges as a result of dependency propagation during the 
execution of a program given as a set of grammar rules 
in cu-Prolog. Actually, there is no need to construct a 
parser. cu-Prolog itself works as an efficient parser. 

Hasida [Hasida 92] has been working on a fundamental 
issue of artificial intelligence and cognitive science from 
the aspect of a computational model. In his computa- 
tion model of dynamical programming, computation is 
controlled by various kinds of potential energies associ- 
ated with each atomic constraint, clause, and unification. 
Potential energy reflects the degree of constraint viola- 
tion and, therefore, the reduction of energy contributes 
constraint resolution. 

Constraint logic programming greatly enriched the 
expressiveness of Prolog and is now providing a very 
promising programming environment for applications by 
extending the domain of Prolog to cover most AI prob- 
lems. 

One big issue in our project is how to integrate con- 
straint logic programming with concurrent logic pro- 
gramming to obtain both expressiveness and efficiency. 

This integration, however, is not easy to achieve be- 
cause (1) constraint logic programming focuses on a con- 
trol scheme for efficient execution specific to each con- 
straint solving scheme, and (2) constraint logic program- 
ming essentially includes a search paradigm which re- 
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quires some suitable support mechanism such as auto- 
matic backtracking. 

It turns out that the first problem can be processed ef- 
ficiently, to some extent, in the concurrent logic program- 
ming scheme utilizing the data flow control method. We 
developed an experimental concurrent constraint logic 
programming language called GDCC (Guarded Defi- 
nite Clauses with Constraints), implemented in KLl 
[Hawley Aiba 91]. GDCC is based on an ask-tell mech- 
anism proposed by Maher [Maher 87], and extended by 
Saras wat [Saraswat 89]. It extends the guard computa- 
tion mechanism from a simple one-way unification solv- 
ing problem to a more general provability check of con- 
ditions in the guard part under a given set of constraints 
using the ask operation. For the body computation, con- 
straint literals appearing in the body part are added to 
the constraint set using the tell operation. If the guard 
conditions are not known to be provable because of a 
lack of information in the constraints set, then compu- 
tation is suspended. If the conditions are disproved un- 
der the constraints set, then the guard computation fails. 
Note that the provability check controls the order of con- 
straint solving execution. New constraints appearing in 
the body of a clause are not included in the constraint 
set until the guard conditions are known to be provable. 

The second problem of realizing a search paradigm in a 
concurrent constraint logic programming framework has 
not been solved so far. One obvious way is to develop an 
OR-parallel search mechanism which uses a full unifica- 
tion engine implemented using ground term representa- 
tion of logical variables [Koshimura et al. 91]. However, 
the performance of the unifier is 10 to 100 times slower 
than the built in unifier and, as such, it is not very practi- 
cal. Another possible solution is to adopt the new coding 
technique introduced in the previous section. We expect 
to be able to efficiently introduce the search paradigm by 
applying the coding method. The paradigm is crucial if 
parallel inference machines are to be made useful for the 
numerous applications which require high levels of both 
expressive and computational power. 

4 Advanced Software Engineer- 
ing 

Software engineering aims at supporting software devel- 
opment in various dimensions; increase of software pro- 
ductivity, development of high quality software, pursuit 
of easily maintainable software and so on. Logic pro- 
gramming has great potential for many dimensions in 
software engineering. One obvious advantage of logic 
programming is the affinity for correctness proof when 
given specifications. Automatic debugging is a related 
issue. Also, there is a high possibility of achieving auto- 
matic program synthesis from specifications by applying 
proof techniques as well as from examples by applying 



induction. Program optimization is another promising 
direction where logic programming works very well. 

In this section, two topics are picked up: (1) meta 
programming and its optimization by partial evaluation, 
and (2) unfold/fold program transformation. 

4.1 Meta Programming and Partial 
Evaluation 

We investigated meta programming technology as a ve- 
hicle for developing knowledge-based systems in a logic 
programming framework inspired by Bowen and Kowal- 
ski’s work [BowenKowalski 83]. It was a rather direct 
way to realize a knowledge assimilation system using the 
meta programming technique by regarding integrity con- 
straints as meta rules which must be satisfied by a knowl- 
edge base. One big problem of the approach was its inef- 
ficiency due to the meta interpretation overhead of each 
object level program. We challenged the problem and 
Takeuchi and Furukawa [TakeuchiFurukawa 86] made a 
breakthrough in the problem by applying the optimiza- 
tion technique of partial evaluation to meta programs. 
We first derived an efficient compiled program for an ex- 
pert system with uncertainty computation given a meta 
interpreter of rules with certainty factor. In this pro- 
gram, we succeeded in getting three times speedup over 
the original program. Then, we tried a more non-trivial 
problem of developing a meta interpreter of a bottom-up 
parser and deriving an efficient compiled program given 
the interpreter and a set of grammar' rules. We suc- 
ceeded in obtaining an object program known as BUP, 
developed by Matsumoto [Matsumoto et al. 83]. The 
importance of the BUP meta-interpreter is that it is not 
a vanilla meta-interpreter, an obvious extension of the 
Prolog interpreter in Prolog, because the control struc- 
ture is totally different from Prolog’s top-down control 
structure. 

After our first success of applying partial evaluation 
techniques in meta programming, we began the devel- 
opment of a self-applicable partial evaluator. Fujita and 
Furukawa [FujitaFurukawa 88] succeeded in developing a 
simple self- applicable partial evaluator. We showed that 
the partial evaluator itself was a meta interpreter very 
similar to the following Prolog interpreter in Prolog: 

solve(true) . 

solve((A,B)) :- solve(A), solve(B). 

solve(A) :- clause(A,B), solve(B). 

where it is assumed that for each program clause, 
H B, a unit clause, clause (AT, B ) , is asserted 2 . A 
goal, solve ( G ) , simulates an immediate execution of the 
subject goal, (7, and obtains the same result. 

This simple definition of a Prolog self-interpreter, 
solve, suggests the following partial solver, psolve. 

2 clause(_,_) is available as a built-in procedure in the 
DECsystem-10 Prolog system. 
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psolve (true , true) . 
psolve((A,B) , (RA,RB)) 

psolve(A ,RA) , psolve(B ,RB) . 

psolve(A,R) 

clause(A,B), psolve(B,R). 
psolve(A.A) ’ $suspend’ (A) . 

The partial solver, psolve (G,R), partially solves a 
given goal, G, returning the result, R. The result, R, 
is called the residual goal(s) for the given goal, G. The 
residual goal may be true when the given goal is totally 
solved, otherwise it will be a conjunction of subgoals, 
each of which is a goal, Ri, suspended to be solved at 
’$suspend' (Ri), for some reason. An auxiliary predi- 
cate , '$suspend' (P), is defined for each goal pattern, 
P, by the user. 

Note that psolve is related to solve as: 

solve(G) psolve(G,R), solve(R). 

That is, a goal, G , succeeds if it is partially solved with 
the residual goal, R, and R in turn succeeds (is totally 
solved). The total solution for G is thus split into two 
tasks: partial solution for G and total solution for the 
residual goal, R. 

We developed a self-applicable partial evaluator by 
modifying the above psolve program. The main modi- 
fication is the treatment of built-in predicates in Prolog 
and those predicates used to define the partial evaluator 
itself to make it self- applicable. We succeeded in apply- 
ing the partial evaluator to itself and generated a com- 
piler by partially evaluating the psolve program with 
respect to a given interpreter, using the identical psolve. 
We further succeeded in obtaining a compiler generator, 
which generates different compilers given different inter- 
preters, by partially evaluating the psolve program with 
respect to itself, using itself. 

Theoretically, it was known that self-application of 
a partial evaluator generates compilers and a compiler 
generator [Futamura 71]. There were many attempts 
to realize self-applicable partial evaluators in the frame- 
work of functional languages for a long time, but no suc- 
cesses were reported until very recently [Jones et al. 85], 
[Jones et al. 88], [Gomard Jones 89]. On the other hand, 
we succeeded in developing a self-applicable partial eval- 
uator in a Prolog framework in a very short time and 
also in a very elegant way. This proves some merits of 
logic programming languages over functional program- 
ming languages, especially in its binding scheme based 
on unification. 

4.2 Unfold/Fold Program Transforma- 
tion 

Program transformation provides a powerful method- 
ology for the development of software, especially the 
derivation of efficient programs either from their formal 



specification or from decralative but possibly inefficient 
programs. Programs written in declarative form are of- 
ten inefficient under Prolog’s standard left to right con- 
trol rule. Typical examples are found in programs based 
on a generate and test paradigm. Seki and Furukawa 
[SekiFurukawa 87] developed a program transformation 
method based on unfolding and folding for such pro- 
grams. We will explain the idea in some detail. Let 
gen_test(L) be a predicate defined as follows: 

gen_test(L) gen(L) , test(L). 

where L is a variable representing a list, gen(L) is a gen- 
erator of the list L, and test(L) is a tester for L. Assume 
both gen and test are incremental and are defined as 
follows: 

gen( [] ) . 

gen([X|L]) :- gen_element (X) , gen(L) . 

test ( [] ) . 

test([X|L]) :- test_element (X) , test(L). 

Then, it is possible to fuse two processes gen and test 
by applying unfold/fold transformation as follows: 

gen_test([X|L]) :- gen([X|L]), test([X|L]). 
unfold at gen and test 

gen_test([X|L]) :- gen_ element (X) , gen(L), 
test_element (X) , test(L). 

fold by gen_test 

gen_test( [X I L] ) :- gen.element (X) , 
test_element (X) , gen_test(L). 

If the tester is not incremental, the above unfold/fold 
transformation does not work. One example is to test 
that all elements in the list are different from each other. 
In this case, the test predicate is defined as follows: 

test ( [] ) . 

test([X|L]) non_member (X ,L) , test(L). 

non_member(_ , [] ) . 

non_member (X , [Y ! L] ) : — 

dif(X,Y), non_member(X,L) . 

where dif (X , Y) is a predi cate judging that X is not equal 
to Y. Note that this test predicate is not incremental be- 
cause a test for the first element X of the list requires the 
information of the entire list. The solution we gave to 
this problem was to replace the test predicate with an 
equivalent predicate with incrementality. Such an equiv- 
alent program test ' is obtained by adding an accumu- 
lator as an extra argument of the test predicate defined 
as follows: 
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test' ( [],_). 

test’ ( [X |L] ,Acc) 

non_member (X , Acc) , t est ’ (L , [X | Acc] ) . 

The relationship between test and test’ is given by 
the following theorem: 

Theorem 

test(L) = test ' (L , [1 ) 

Now, the original gen_test program becomes 

gen_test(L) gen(L) , test’(L,[]). 

We need to introduce the following new predicate to per- 
form the unfold/fold transformation: 

gen.test ’ (L,Acc) :- gen(L) , test ’ (L, Acc) . 

By applying a similar transformation process as be- 
fore, we get the following fused recursive program of 
gen_test ’ : 

gen_test ' ([],_). 

gen_test ’ ( [X I L] ,Acc) :- gen_element(X) , 

non_member(X , Acc) , gen_test’ (L, [XlAcc]) . 

By symbolically computing the two goals 
?- test( [XI .... ,Xn] ) . 

?- test ’ ( [XI , . . . ,Xn] ) . 

and comparing the results, one can find that the reorder- 
ing of pair-wise comparisons by the introduction of the 
accumulator is analogous to the exchange of double sum- 
mation Therefore, we refer 

to this property as structural commutativity. 

One of the key problems of unfold/fold transformation 
is the introduction of a new predicate such as gen_test ’ 
in the last example. Kawamura [Kawamura 91] devel- 
oped a syntactic rule for finding suitable new predicates. 
There were several attempts to find appropriate new 
predicates using domain dependent heuristic knowledge, 
such as append optimization by the introduction of dif- 
ference list representation. Kawamura’ s work provides 
some general criteria for selecting candidates for new 
predicates. His method first analyzes a given program 
to be transformed and makes a list of patterns which 
may possibly appear in the definition of new predicates. 
This can be done by unfolding a given program and prop- 
erly generalizing all resulting patterns to represent them 
with a finite number of distinct patterns while avoid- 
ing over-generalization. One obvious strategy to avoid 
over-generalization is to perform least general general- 
ization by Plotkin [Plotkin 70]. Kawamura also intro- 
duced another strategy for suppressing unnecessary gen- 
eralization: a subset of clauses of which the head can be 



unifiable to each pattern is associated with the pattern 
and only those patterns having the same associated sub- 
set of clauses are generalized. Note that a goal pattern 
is unfolded only by clauses belonging to the associated 
subset. Therefore the suppression of over-generalization 
also suppresses unnecessary expansion of clauses by un- 
necessary unfolding. 

5 Logic-based AI Research 

For a long time, deduction has played a central role in 
research on logic and logic programming. Recently, two 
other inferences, abduction and induction, received much 
attention and much research has been done in these new 
directions. These directions are related to fundamental 
AI problems that are open-ended by their nature. They 
include the frame problem, machine learning, distributed 
problem solving, natural language understanding, com- 
mon sense reasoning, hypothetical reasoning and ana- 
logical reasoning. These problems require non-deductive 
inference capabilities in order to solve them. 

Historically, most AI research on these problems 
adopted ad hoc heuristic methods reflecting problem 
structures. There was a tendency to avoid a logic based 
formal approach because of a common belief in the lim- 
itation of the formalism. However, the limitation of log- 
ical formalism comes only from the deductive aspect of 
logic. Recently it has been widely recognized that ab- 
duction and induction based on logic provide a suitable 
framework for such problems requiring open-endedness 
in their formalism. There is much evidence to support 
this observation. 

• In natural language understanding, unification 
grammar is playing an important role in integrat- 
ing syntax, semantics, and discourse understanding. 

• In non-monotonic reasoning, logical formalism such 
as circumscription and default reasoning and its 
compilation to logic based programs are studied ex- 
tensively. 

• In machine learning, there are many results based 
on logical frameworks such as the Model Inference 
System, inverse resolution, and least general gener- 
alization. 

• In analogical reasoning, analogy is naturally de- 
scribed in terms of a formal inference rule similar to 
logical inference. The associated inference is deeply 
related to abductive inference. 

In the following, three topics related to these issues 
are picked up: they are hypothetical reasoning, analogy, 
and knowledge representation. 
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5.1 Hypothetical Reasoning 

A logical framework of hypothetical reasoning was stud- 
ied by Poole et al. [Poole et al. 87]. They discussed the 
relationship among hypothetical reasoning, default rea- 
soning and circumscription, and argued that hypotheti- 
cal reasoning is all that is needed because it is simply and 
efficiently implemented and is powerful enough to imple- 
ment other forms of reasoning. Recently, the relation- 
ship of these formalisms was studied in more detail and 
many attempts were made to translate non-monotonic 
reasoning problems into equivalent logic programs with 
negation as failure. 

Another direction of research was the formulation of 
abduction and its relationship to negation as failure. 
There was also a study of the model theory of a class 
of logic programs, called general logic programs, allow- 
ing negation by failure in the definition of bodies in the 
clausal form. By replacing negation-by-failure predicates 
by corresponding abducible predicates which usually give 
negative information, we can formalize negation by fail- 
ure in terms of abduction [EshghiKowalski 89] 

A proper semantics of general logic programs is given 
by stable model semantics [GelfondLifschitz 88]. It is a 
natural extension of least fixpoint semantics. The differ- 
ence is that there is no Tp operator to compute the sta- 
ble model directly, because we need a complete model for 
checking the truth value of the literal of negation by fail- 
ure in bottom-up fixpoint computation. Therefore, we 
need to refer to the model in the definition of the model. 
This introduces great difficulty in computing stable mod- 
els. The trivial way is to assume all possible models and 
see whether the initial models are the least ones satisfy- 
ing the programs or not. This algorithm needs to search 
for all possible subsets of atoms to be generated by the 
programs and is not realistic at all. 

Inoue [Inoue et al. 92] developed a much more efficient 
algorithm for computing all stable models of general logic 
programs. Their algorithm is based on bottom-up model 
generation method. Negation-by-failure literals are used 
to introduce hypothetical models: ones which assume 
the truth of the literals and the others which assume 
that they are false. To express assumed literals, they in- 
troduce a modal operator. More precisely, they translate 
each rule of the form: 

A; < — A ... A A m , not A m + i A ... A not A n 

to the following disjunctive clause which does not contain 
any negation-by-failure literals: 

A+ 1 A ... A A m > 

(NKA m+1 A ... A NK A n A A ,) V KA m+1 V ... V K A n . 

The reason why we express the clause with the an- 
tecedent on the left hand side is that we intend to use 
this clause in a bottom-up way; that is, from left to right. 

In this expression, NKA means that we assume that A is 



false, whereas, KA means that we assume that A is true. 
Although K and NK are modal operators, we can treat 
KA and NKA as new predicates independent from A by 
adding the following constraints: 

NKA, A— > for every atom A . (1) 

NKA, KA — + for every atom A. (2) 

By this translation, we obtain a set of clauses in first 
order logic and therefore it is possible to compute all 
possible models for the set using a first order bottom-up 
theorem prover, MGTP, described in Section 2. After 
computing all possible models for the set of clauses, we 
need to select only those models M which satisfy the 
following condition: 

For every ground atom A, if KA G M, then A € M . 

(3) 

Note that this translation scheme defines a coding 
method of original general logic programs which may 
contain negation by failure in terms of pure first order 
logic. Note also that the same technique can be applied 
in computing abduction, which means to find possible 
sets of hypotheses explaining the observation and not 
contradicting given integrity constraints. 

Satoh and Iwayama [Satohlwayama 92] independently 
developed a top-down procedure for answering queries to 
a general logic program with integrity constraints. They 
modified an algorithm proposed by Eshghi and Kowalski 
[EshghiKowalski 89] to correctly handle situations where 
some proposition must hold in a model, like the require- 
ment of (3). 

Iwayama and Satoh [IwayamaSatoh 91] developed a 
mixed strategy combining bottom-up and top-down 
strategies for computing the stable models of general 
logic programs with constraints. The procedure is ba- 
sically bottom-up. The top-down computation is related 
to the requirement of (3) and as soon as a hypothesis of 
KA is asserted in some model, it tries to prove A by a 
top-down expectation procedure. 

The formalization of abductive reasoning has a wide 
range of applications including computer aided design 
and fault diagnosis. Our approach provides a uniform 
scheme for representing such problems and solving them. 
It also provides a way of utilizing our parallel inference 
machine, PIM, for solving these complex Al problems. 

5.2 Formal Approach to Analogy 

Analogy is an important reasoning method in human 
problem solving. Analogy is very helpful for solving 
problems which are very difficult to solve by themselves. 
Analogy guides the problem solving activities using the 
knowledge of how to solve a similar problem. Another 
aspect of analogy is to extract good guesses even when 
there is not enough information to explain the answer. 

There are three major problems to be solved in order 
to mechanize analogical reasoning [Arima 92]: 
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• searching for an appropriate base of analogy with 
respect to a given target, 

• selecting important properties shared by a base and 
a target, and 

• selecting properties to be projected through an anal- 
ogy from a base to a target. 

Though there was much work on mechanizing analogy, 
most of this only partly addressed the issues listed above. 
Arima [Arima 92] proposed an attempt to answer all the 
issues at once. Before explaining his idea, we need some 
preparations for defining terminology. 

Analogical reasoning is expressed as the following in- 
ference rule: 

S(B) A P(B) 

S(T ) 

P(T) 

where T represents the target object, B the base object, 
S the similarity property between T and B, and P the 
projected property. 

This inference rule expresses that if we assume an ob- 
ject T is similar to another object B in the sense that 
they share a common property S then, if B has another 
property P, we can analogically reason that T also has 
the same property P. Note that the syntactic similarity 
of this rule to modus ponens. If we generalize the ob- 
ject B to a universally quantified variable X and replace 
the and connective to the implication connective, then 
the first expression of the rule becomes S( X) D P(X), 
thereby the entire rule becomes modus ponens. 

Arima [Arima 92] tried to link the analogical reason- 
ing to deductive reasoning by modifying the expression 
S{B) A P(B) to 

Vx.(J(x ) A 5(a:) D P(x)), (4) 

where J(x) is a hypothesis added to S(x) in order to 
logically conclude P(x). If there exists such a J(x), then 
the analogical reasoning becomes pure deductive reason- 
ing. For example, let us assume that there is a student 
{Students) who belongs to an orchestra club and also 
neglects study. If one happens to know that another 
student ( Studentx ) belongs to the orchestra club, then 
we tend to conclude that he also neglects study. The 
reason why we derive such a conclusion is that we guess 
that the orchestra club is very active and student mem- 
bers of this busy club tend to neglect study. This reason 
is an example of the hypothesis mentioned above. 

Arima analyzed the syntactic structure of the above 
J(x) by carefully observing the analogical situation. 
First, we need to find a proper parameter for the pred- 
icate J . Since it is dependent on not only an object 
but also the similarity property and the projected prop- 
erty, we assume that J has the form of J(x, s,p), where s 



and p represent the similarity property and the projected 
property. 

From the nature of analogy, we do not expect that 
there is any direct relationship between the object x and 
the projected property p. Therefore, the entire J(x,s,p) 
can be divided into two parts: 

J{x,s,p) = J a tt{s,p) A J ob j(x,s), (5) 

The first component, J att (s,p), corresponds to informa- 
tion extracted from a base. The reason why it does not 
depend on x comes from the observation that informa- 
tion in the base of the analogy is independent from the 
choice of an object x. 

The second component, J 0 bj(x,s), corresponds to in- 
formation extracted from the similarity and therefore it 
does not contain p as its parameter. 

Example: Negligent Student 

First, let us formally describe the hypothesis described 
earlier to explain why an orchestra member is negligent 
of study. It is expressed as follows: 

Vx,s,p.( Enthusiastic(x, s) A BusyClub{s) 
AObstructiveJo{p , s) A Member ~of(x, s) 

D Negligent-of(x,p ) ) (6) 

where x, s, and p are variables representing a person, a 
club and some human activity, respectively. The mean- 
ing of each predicate is easy to understand and the 
explanations are omitted. Since we know that both 
Students and Studentx are members of an orchestra, 
M embers-of {X , s) corresponds to the similarity prop- 
erty S(x) in (4). On the other hand, since we want to rea- 
son the negligence of a student, the projected property 
P(x) is Negligent.of(x,p). Therefore, the rest of the 
expression in (6): Enthusiastic(x, s) A BusyClub{s ) A 
Obstructive-to{p, s) corresponds to J{x,s,p). From the 
syntactic feature of this expression, we can conclude that 

J 0 bj(x,s ) = Enthusiastic(x,s), 

J a tt{s,p ) = BusyClub(s) A Obstructive Jo(p, s). 

The reason why we need J D bj is that we are not al- 
ways aware of an important similarity like Enthusiastic. 
Therefore, we need to infer an important hidden similar- 
ity from the given similarity such as M ember _o / . This 
inference requires an extra effort in order to apply the 
above framework of analogy. 

The restriction on the syntactic structure of J{x,s,p ) 
is very important since it can be used to prune a search 
space to access the right base case given the target. This 
function is particularly important when we apply our 
analogical inference framework to case based reasoning 
systems. 
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5.3 Knowledge Representation 

Knowledge representation is one of the central issues in 
artificial intelligence research. Difficulty arises from the 
fact that there has been no single knowledge representa- 
tion scheme for representing various kinds of knowledge 
while still keeping the simplicity as well as the efficiency 
of their utilization. Logic was one of the most promising 
candidates but it was weak in representing structured 
knowledge and the changing world. Our aim in devel- 
oping a knowledge representation framework based on 
logic and logic programming is to solve both of these 
problems. From the structural viewpoint, we developed 
an extended relational database which can handle non- 
normal forms and its corresponding programming lan- 
guage, CRL [Yokota 88a]. This representation allows 
users to describe their databases in a structured way in 
the logical framework [Yokota et al. 88b]. 

Recently, we proposed a new logic-based knowledge 
representation language, Quixote [YasukawaYokota 90]. 
Quixote follows the ideas developed in CRL and CIL: 
it inherits object-orientedness from the extended version 
of CRL and partially specified terms from CIL. One of 
the main characteristics of the object-oriented features 
is the notion of object identity. In Quixote, not only 
simple data atoms but also complex structures are can- 
didates for object identifiers [Morita 90]. Even circular 
structures can be represented in Quixote. The non-well 
founded set theory by Aczel [Aczel 88] was adopted to 
characterize them as a mathematical foundation for such 
objects, and unification on infinite trees [Colmerauer 82] 
was adopted as an implementation method. 



6 Conclusion 

In this article, we summarized the basic research activi- 
ties of the FGCS project. We emphasized two different 
directions of logic programming research. One followed 
logic programming languages where constraint logic pro- 
gramming and concurrent logic programming were fo- 
cussed. The other followed basic research in artificial 
intelligence and software engineering based on logic and 
logic programming. 

This project has been like solving a jigsaw puzzle. It 
is like trying to discover the hidden picture in the puzzle 
using logic and logic programming as clues. The research 
problems to be solved were derived naturally from this 
image. There were several difficult problems. For some 
problems, we did not even have the right evaluation stan- 
dard for judging the results. The design of GHC is such 
an example. Our entire picture of the project helped in 
guiding our research in the right direction. 

The picture is not completed yet. We need further 
efforts to fill in the remaining spaces. One of the most 
important parts to be added to this picture is the inte- 
gration of constraint logic programming and concurrent 



logic programming. We mentioned our preliminary lan- 
guage/system, GDCC, but this is not yet matured. We 
need a really useful language which can be efficientlly ex- 
ecuted on parallel hardware. Another research subject 
to be pursued is the realization of a database in KLl. 
We are actively constructing a parallel database but it 
is still in the preliminary stages. We believe that there 
is much affinity between databases and parallelism and 
we expect a great deal of parallelism from database ap- 
plications. The third research subject to be pursued is 
the parallel implementation of abduction and induction. 
Recently, there has been much work on abduction and 
induction based on logic and logic programming frame- 
works. They are expected to provide a foundation for 
many research themes related to knowledge acquisition 
and machine learning. Also, both abduction and induc- 
tion require extensive symbolic computation and, there- 
fore, fit very well with PIM architecture. 

Although further research is needed to make our re- 
sults really useful in a wide range of large-scale applica- 
tions, we feel that our approach is in the right direction. 
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Abstract 

This paper aims at a concise introduction to the PIM 
and its basic software, including the overall framework 
of the project. Now an FGCS prototype system is under 
development. Its core is called a parallel inference sys- 
tem which includes a parallel inference machine, PIM, 
and its operating system, PIMOS. The PIM includes 
five hardware modules containing about 1,000 element 
processors in total. On the parallel inference system, 
there is a knowledge base management system (KBMS). 
The PIMOS and KBMS make a software layer called a 
basic software of the prototype system. These systems 
are already being run on the PIM. On these systems, a 
higher- level software layer is being developed. It is called 
a knowledge programming software. This is to be used 
as a tool for more powerful inference and knowledge pro- 
cessing. It contains language processors for constraint 
logic programming languages, parallel theorem provers 
and natural language processing systems. Several experi- 
mental application programs are also being developed for 
both general evaluation of the PIM and the exploration 
of new application fields for knowledge processing. These 
achievements with the PIM and its basic software easily 
surpass the research targets set up at the beginning of 
the project. 



1 Introduction 

Since the fifth generation computer systems project 
(FGCS) was started in June, 1982, 10 years have passed, 
and the project is approaching its goal. This project 
assumed that “logic” was the theoretical backbone of 
future knowledge information processing, and adapted 
logic programming as the kernel programming language 
of fifth generation computer systems. In addition to the 
adaptation of logic programming, highly parallel process- 
ing for symbolic computation was considered indispens- 
able for implementing practical knowledge information 
processing systems. Thus, the project aimed to create a 
new computer technology combining knowledge process- 
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ing with parallel processing using logic programming. 

Now an FGCS prototype system is under develop- 
ment. This system integrates the major research achieve- 
ments of these 10 years so that they can be evaluated 
and demonstrated. Its core is called a parallel infer- 
ence system which includes a parallel inference ma- 
chine, PIM, and its operating system, PIMOS. The PIM 
includes five hardware modules containing about 1,000 
element processors in total. It also includes a language 
processor for a parallel logic language, KL1. 

On the parallel inference system, there is a 
knowledge base management system (KBMS). 
The KBMS includes a database management system 
(DBMS), Kappa-P, as its lower layer. The KBMS 
provides a knowledge representation language, Quixote, 
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based on the deductive (and) object-oriented database. 
The PIMOS and KBMS make a software layer called a 
basic software of the prototype system. These systems 
are already being run on the PIM. The PIM and basic 
software are now being used as a new research platform 
for building experimental parallel application programs. 
They are the most complete of their kind in the world. 

On this platform, a higher-level software layer is being 
developed. This is to be used as a tool for more power- 
ful inference and knowledge processing. It contains lan- 
guage processors for constraint logic programming lan- 
guages, parallel theorem provers, natural language pro- 
cessing systems, and so on. These software systems all 
include the most advanced knowledge processing tech- 
niques, and are at the leading edge of advanced software 
science. 

Several experimental application programs are also be- 
ing developed for both general evaluation of the PIM 
and the exploration of new application fields for knowl- 
edge processing. These programs include a legal reason- 
ing system, genetic information processing systems, and 
VLSI CAD systems. They are now operating on the 
parallel inference system, and indicate that parallel pro- 
cessing of knowledge processing applications is very ef- 
fective in shortening processing time and in widening the 
scope of applications. However, they also indicate that 
more research should be made into parallel algorithms 
and load balancing methods for symbol and knowledge 
processing. These achievements with the PIM and its 
basic software easily surpass the research targets set up 
at the beginning of the project. 

This paper aims at a concise introduction to the PIM 
and its basic software, including the overall framework 
of the project. This project is the first Japanese na- 
tional project that aimed at making a contribution to 
world computer science and the promotion of interna- 
tional collaboration. We have published our research 
achievements wherever possible, and distributed various 
programs from time to time. Through these activities, 
we have also been given much advice and help which 
was very valuable in helping us to attain our research 
targets. Thus, our achievements in the project are also 
the results of our collaboration with world researchers on 
logic programming, parallel processing and many related 
fields. 

2 Research Targets and Plan 
2.1 Scope of R & D 

The general target of the project is the development of 
a new computer technology for knowledge information 
processing. 

Having “mathematical logic” as its theoretical back- 
bone, various research and development themes were es- 
tablished on software and hardware technologies focusing 



on knowledge and symbol processing. These themes are 
grouped into the following three categories: 

2.1.1 Parallel inference system 

The core portion of the project was the research and de- 
velopment of the parallel inference system which contains 
the PIM, a KL1 language processor, and the PIMOS. To 
make the goal of the project clear, a FGCS prototype 
system was considered a major target. This was to be 
build by integrating many experimental hardware and 
software components developed around logic program- 
ming. 

The prototype system was defined as a parallel infer- 
ence system which is intended to have about 1,000 ele- 
ment processors and attain more than 100M LIPS (Log- 
ical Inference Per Second) as its execution speed. It was 
also intended to have a parallel operating system, PI- 
MOS, as part of the basic software which provides us 
with an efficient parallel programming environment in 
which we can easily develop various parallel application 
programs for symbol and knowledge processing, and run 
them efficiently. Thus, this is regarded as the develop- 
ment of a super computer for symbol and knowledge pro- 
cessing. 

It was intended that overall research and development 
activities would be concentrated so that the major re- 
search results could be integrated into a final prototype 
system, step by step, over the timespan allotted to the 
project. 

2.1.2 KBMS and knowledge programming soft- 
ware 

Themes in this category aimed to develop a basic soft- 
ware technology and theory for knowledge processing. 

• Knowledge representation and knowledge base man- 
agement 

• High-level problem solving and inference software 

• Natural language processing software 

These research themes were intended to create new 
theories and software technologies based on mathemat- 
ical logic to describe various knowledge fragments 
which are parts of “natural” knowledge bases pro- 
duced in our social systems. We also intended to store 
them in a computer system as components of “artifi- 
cial” knowledge bases so that they can be used to 
build various intelligent systems. 

To describe the knowledge fragments, a knowledge rep- 
resentation language has to be provided. It can be re- 
garded as a very high-level programming language exe- 
cuted by a sophisticated inference mechanism which is 
much cleverer than the parallel inference system. Nat- 
ural language processing research is intended to cover 
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research on knowledge representation methods and such 
inference mechanisms, in addition to research on easy- 
to-use man-machine interface functions. Experimental 
software building for some of these research themes was 
done on the sequential inference machines because the 
level of research was so basic that computational power 
was not the major problem. 

2.1.3 Benchmarking and evaluation systems 

• Benchmarking software for the parallel inference 
system 

• Experimental parallel application software 

To carry out research on an element technology in com- 
puter science, it is essential that an experimental soft- 
ware system is built. Typical example problems can then 
be used to evaluate theories or methods invented in the 
progress of the research. 

To establish general methods and technologies for 
knowledge processing, experimental systems should be 
developed for typical problems which need to process 
knowledge fragments as sets of rules and facts. 

These problems can be taken from engineering sys- 
tems, including machine design and the diagnosis of ma- 
chine malfunction, or from social systems such as medical 
care, government services, and company management. 



Generally, the exploitation of computer technology for 
knowledge processing is far behind that for scientific cal- 
culation. Recent expert systems and machine translation 
systems are examples of the most advanced knowledge 
processing systems. However, the numbers of rules and 
facts in their knowledge bases are several hundreds on 
average. 

This scale of knowledge base may not be large enough 
to evaluate the maximum power of parallel inference sys- 
tem having about 1,000 element processors. Thus, re- 
search and development on large-scale application sys- 
tems is necessary not only for knowledge processing re- 
search but also for the evaluation of the parallel infer- 
ence system. Such application systems should be widely 
looked for in many new fields. 

The scope of research and development in this project 
is very wide, however, the parallel inference system is 
central to the whole project. It is a very clear research 
target. Software research and development should also 
cover diverse areas in recent software technology. How- 
ever, it has “logic” as the common backbone. 

It was also intended that major research achievements 
should be integrated into one prototype system. This has 
made it possible for us to organize all of our research and 
development in a coherent way. At the beginning of the 
project, only the parallel inference machine was defined 
as a target which was described very clearly. The other 
research targets described above were not planned at the 
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beginning of the project. They have been added in the 
middle of the intermediate stage or at the final stage. 

2.2 Overall R &: D plan 

After three years of study and discussions on determining 
our major research fields and targets, the final research 
and development plan was determined at the end of fiscal 
1981 with the budget for the first fiscal year. 

At that time, practical logic programming languages 
had begun to be used in Europe mainly for natural lan- 
guage processing. The feasibility and potential of logic 
languages had not been recognized by many computer 
scientists. Thus, there was some concern that the level 
of language was too high to describe an operating sys- 
tem, and that the overhead of executing logic programs 
might be too large to use it for practical applications. 
This implies that research on logic programming was in 
its infancy. 

Research on parallel architectures linked with high- 
level languages was also in its infancy. Research on 
dataflow architectures was the most advanced at that 
time. Some dataflow architecture was thought to have 
the potential for knowledge and symbol processing. How- 
ever, its feasibility for practical applications had not yet 
been evaluated. 

Most of the element technologies necessary to build the 
core of the parallel inference system were still in their in- 
fancy. We then tried to define a detailed research plan 
step by step for the 10-year project period. We divided 
the 10-year period into three stages, and defined the re- 
search to be done in each stage as follows: 

• Initial stage (3 years) : 

-Research on potential element technologies 
-Development of research tools 

• Intermediate stage (4 years) : 

-First selection of major element technologies for fi- 
nal targets 

-Experimental building of medium-scale systems 

• Final stage (3 years) : 

-Second selection of major element technologies for 
final targets 

-Experimental building of a final full-scale system 

At the beginning of the project, we made a detailed 
research and development plan only for the initial stage. 
We decided to make detailed plans for the intermediate 
and final stages at the end of the stage before, so that 
the plans would reflect the achievements of the previous 
stage. The research budget and manpower were to be 
decided depending on the achievements. It was likely 
that the project would effectively be terminated at the 
end of the initial stage or the intermediate stage. 



3 Inference System in the Initial 
Stage 

3.1 Personal Sequential Inference Ma- 
chine (PSI-I) 

To actually build the parallel inference system, especially 
a productive parallel programming environment which is 
now provided by PIMOS, we needed to develop various 
element technologies step by step to obtain hardware and 
software components. On the way toward this develop- 
ment, the most promising methods and technologies had 
to be selected from among many alternatives, followed by 
appropriate evaluation processes. To make this selection 
reliable and successful, we tried to build experimental 
systems which were as practical as possible. 

In the initial stage, to evaluate the descriptive power 
and execution speed of logic languages, a personal se- 
quential machine, PSI, was developed. This was a logic 
programming workstation. This development was also 
aimed at obtaining a common research tool for software 
development. The PSI was intended to attain an execu- 
tion speed similar to DEC10 Prolog running on a DEC20 
system, which was the fastest logic programming system 
in the world. 

To begin with, a PSI machine language, KLO, was de- 
signed based on Prolog. Then a hardware system was de- 
signed for the KLO. We employed tag architecture for the 
hardware system. Then we designed a system descrip- 
tion language, ESP, which is a logic language having a 
class and inheritance mechanisms to make program mod- 
ules efficiently. [Chikayama 1984] ESP was used not only 
to write the operating system for PSI, which is named 
SIMPOS, but also to write many experimental software 
systems for knowledge processing research. 

The development of the PSI machine and SIMPOS was 
successful. We were impressed by the very high software 
productivity of the logic language. The execution speed 
of the PSI was about 35K LIPS and exceeded its target. 
However, we realized that we could improve its architec- 
ture by using the optimization capability of a compiler 
more effectively. We produced about 100 PSI machines 
to distribute as a common research tool. This version of 
the PSI is called PSI-I. 

In conjunction with the development of PSI-I and 
SIMPOS, research on parallel logic languages was ac- 
tively pursued. In those days, pioneering efforts were 
being made on parallel logic languages such as PAR- 
LOG and Concurrent Prolog. [Clark and Gregory 1984], 
[Shapiro 1983] We learned much from this pioneering re- 
search, and aimed to obtain a simpler language more 
suited for a machine language for a parallel inference ma- 
chine. Near the end of the initial stage, a new parallel 
logic language, GHC was designed. [Ueda 1986] 
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Table 1: Development of Inference Systems 




3.2 Effect of PSI development on the 
research plan 

The experience gained in the development of PSI-I and 
SIMPOS heavily affected the planning of the intermedi- 
ate stage. 

3.2.1 Efficiency in program production 

One of the important questions related to logic language 
was the feasibility of writing an operating system which 
needs to describe fine detailed control mechanisms. An- 
other was its applicability to writing large-scale pro- 
grams. SIMPOS development gave us answers to these 
questions. The SIMPOS has a multi-window-based user 
interface, and consists of more than 100,000 ESP pro- 
gram lines. It was completed by a team of about 20 
software researchers and engineers over about two years. 
Most of the software engineers were not familiar with 
logic languages at that time. 

We found that logic languages have much higher 
productivity and maintainability than conventional von 
Neumann languages. This was obvious enough to con- 
vince us to describe a parallel operating system also in a 
logic language. 

3.2.2 Execution performance 

The PSI-I hardware and firmware attained about 35K 
LIPS. This execution speed was sufficient for most knowl- 
edge processing applications. The PSI had an 80 MB 
main memory. It was a. very big memory compared to 
mainframe computers at that time. We found that this 
large memory and fast execution speed made a logic lan- 
guage a practical and highly productive tool for software 



prototyping. 

The implementation of the PSI-I hardware required 
11 printed circuit boards. As the amount of hardware 
became clear, we established that we could obtain an 
element processor for a parallel machine if we used VLSI 
chips for implementation. 

For the KL0 language processor which was imple- 
mented in the firmware, we estimated that better op- 
timization of object code made by the compiler would 
greatly improve execution speed. (Later, this op- 
timization was made by introducing of the “WAM” 
code. [Warren 1983]) 

The PSI-I and SIMPOS proved that logic languages 
are a very practical and productive vehicle for complex 
knowledge processing applications. 

4 Inference Systems in the In- 
termediate Stage 

4.1 A parallel inference system 

4.1.1 Conceptual design of KL1 and PIMOS 

The most important target in the intermediate stage was 
a parallel implementation of a KL1 language processor, 
and the development of a parallel operating system, PI- 
MOS. 

The full version of GHC, was still too complex for the 
machine implementation. A simpler version, FGHC, 
was designed. [Chikayama and Kimura 1985] Finally, a 
practical parallel logic language, KL1, was designed 
based on FGHC. 

The KLl is a parallel language classified as an 
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AND-parallel logic programming language. Its lan- 
guage processor includes an automatic memory manage- 
ment mechanism and a dataflow process synchronization 
mechanism. These mechanisms were considered essential 
for writing and compiling large parallel programs. The 
first problem was whether they could be implemented ef- 
ficiently. The second problem was what kind of firmware 
and hardware support would be possible and effective. 

In addition to problems in implementing the KL1 lan- 
guage processor, the design of PIMOS created several im- 
portant problems. The role of PIMOS is different from 
that of conventional operating systems. PIMOS does 
not need to do primary process scheduling and mem- 
ory management because these tasks are performed by 
the language processor. It still has to perform resource 
management for main memory and element processors, 
and control the execution of user programs. However, a 
much more difficult role was added. It must allow a user 
to divide a job into parallel processable processes and 
distribute them to many element processors. Processor 
loads must be well balanced to attain better execution 
performance. In knowledge and symbol processing ap- 
plications, the dynamic structure of a program is not 
regular. It is difficult to estimate the dynamic program 
structure. It was desirable that PIMOS could offer some 
support for efficient job division and load balancing prob- 
lems . 

These problems in the language processor and the op- 
erating system were very new, and had not been studied 
as practical software problems. To solve these problems, 
we realized that we must have appropriate parallel hard- 
ware as a platform to carry out practical software exper- 
iments using a trial and error. 

4.1.2 PSI-II and Multi-PSI system 

In conjunction with the development of KL1 and PI- 
MOS, we needed to extend our research and develop new 
theories and software technologies for knowledge process- 
ing using logic programming. This research and develop- 
ment demanded improvement of PSI-I machines in such 
aspects as performance, memory size, cabinet size, disk 
capacity, and network connection. 

We decided to develop a smaller and higher- 
performance model of PS I, to be called PSI-II. This 
was intended to provide a better workstation for use as 
a common tool and also to obtain an element processor 
for the parallel hardware to be used as a platform for 
parallel software development. This hardware was called 
a multi-PSI system. It was regarded as a small-scale 
experimental version of the PIM. As many PSI-II ma- 
chines were produced, we anticipated having very stable 
element processors for the multi-PSI system. 

The PSI-II used VLSI gate array chips for its CPU. 
The size of the cabinet was about one sixth that of PSI- 
I. Its execution speed was 330K LIPS, about 10 times 
faster than that of PSI-I. This improvement was attained 



mainly through employment of the better compiler opti- 
mization technique and improvement of its machine ar- 
chitecture. The main memory size was also expanded to 
320 MB so that prototyping of large applications could 
be done quickly. 

In the intermediate stage, many experimental systems 
were built on PSI-I and PSI-II systems for knowledge 
processing research. These included small-to-medium 
scale expert systems, a natural language discourse un- 
derstanding system, constraint logic programming sys- 
tems, a database management system, and so on. These 
systems were all implemented in the ESP language using 
about 300 PSI-II machines distributed to the researchers, 
as their personal tools. 

The development of the multi-PSI system was com- 
pleted in the spring of 1988. It consists of 64 element pro- 
cessors which are connected by an 8 by 8 mesh network. 
One element processor is contained in three printed cir- 
cuit boards. Eight element processors are contained in 
one cabinet. Each element processor has an 80 MB main 
memory. Thus, a multi-PSI was to have about 5GB 
memories in total. This hardware was very stable, as 
we had expected. We produced 6 multi-PSI systems and 
distributed them to main research sites. 

4.1.3 KL1 language processor and PIMOS 

This was the first trial implementation of a distributed 
language processor of a parallel logic language, and a 
parallel operating system on real parallel hardware, used 
as a practical tool for parallel knowledge processing ap- 
plications. 

The KL1 distributed language processor was an inte- 
gration of various complex functional modules such as a 
distributed garbage collector for loosely-coupled memo- 
ries. The automatic process synchronization mechanism 
based on the dataflow model was also difficult to imple- 
ment over the distributed element processors. Parts of 
these mechanisms had to be implemented combined with 
some PIMOS functions such as a dynamic on-demand 
loader for object program codes. Other important func- 
tions related to the implementation of the language pro- 
cessor were support functions like system debugging, sys- 
tem diagnostic, and system maintenance functions. 

In addition to these functions for the KL1 language 
processor, many PIMOS functions for resource manage- 
ment and execution control had to be designed and im- 
plemented step by step, with repeated partial module 
building and evaluation. 

This partial module building and evaluation was done 
for core parts of the KLl language processor and PIMOS, 
using not only KLl but also ESP and C languages. An 
appropriate balance between the functions of the lan- 
guage processor and the functions of PIMOS was con- 
sidered. The language processor was implemented in a 
PSI-II firmware for the first time. It worked as a pseudo 
parallel simulator of KLl, and was used as a PIMOS 
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• Machine language: KLl-b 

• Max. 64PEs and two FEPs (PSI-II) connected to LAN 

• Architecture of PE: 

- Microprogram control (64 bits/word) 

- Machine cycle: 200ns, Reg.file: 64W 

- Cache: 4 KW, set associative/write-back 

- Data width: 40 bits/word 

- Memory capacity: 16MW (80MB) 

• Network: 

- 2-dimensional mesh 

- 5MB/s x 2 directions/ch with 2 FIFO buffers/ch 

- Packet routing control function 



Figure 3: Multi-PSI System: Main features and Appearance 



development tool. It was eventually extended and trans- 
ported to the multi-PSI system. 

In the development of PIMOS, the first partial mod- 
ule building was done using the C language in a Unix 
environment. This system is a tiny subset of the KL1 
language processor and PIMOS, and is called the PI- 
MOS Development Support System (PDSS). It is now 
distributed and used for educational purposes. The 
first version of PIMOS was released on the PSI-II with 
the KL1 firmware language processor. This is called a 
pseudo multi-PSI system. It is currently used as a 
personal programming environment for KL1 programs. 

With the KL1 language processor fully implemented 
in firmware, one element processor or a PSI-II attained 
about 150 KLIPS for a KL1 program. It is interesting 
to compare this speed with that for a sequential ESP 
program. As a PSI-II attains about 300 KLIPS for a 
sequential ESP program, the overhead for KL1 caused 
by automatic process synchronization halves the execu- 
tion speed. This overhead is compensated for by effi- 
cient parallel processing. A full-scale multi-PSI system 
of 64 element processors could attain 5 - 10 MLIPS. This 
speed was considered sufficient for the building of exper- 
imental software for symbol and knowledge processing 
applications. On this system, simple benchmarking pro- 
grams and applications such as puzzle programs, a natu- 
ral language parser and a Go-game program were quickly 
developed. These programs and the multi-PSI sys- 
tem was demonstrated in FGCS’88.[Uchida et al. 1988] 
These proved that KL1 and PIMOS could be used as a 



new platform for parallel software research. 

4.2 Overall design of the parallel in- 
ference system 

4.2.1 Background of the design 

The first question related to the design of the parallel 
inference system was what kind of functions must be 
provided for modeling and programming complex prob- 
lems, and for making them run on large-scale parallel 
hardware. 

When we started this project, research on parallel pro- 
cessing still tended to focus on hardware problems. The 
major research and development interest was in SIMD 
or MIMD type machines applied for picture processing 
or large-scale scientific calculations. Those applications 
were programmed in Fortran or C. Control of parallel 
execution of those programs, such as job division and 
load balancing, was performed by built-in programs or 
prepared subroutine libraries, and could not be done by 
ordinary users. 

Those machines excluded most of the applications 
which include irregular computations and require gen- 
eral parallel programming languages and environments. 
This tendency still continues. Among these parallel ma- 
chines, some dataflow machines were exceptional and had 
the potential to have functional languages and their gen- 
eral parallel programming environment. 

We were confident that a general parallel programming 
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language and environment is indispensable for writing 
parallel programs for large-scale symbol and knowledge 
processing applications, and that they must provide such 
functions as follows: 

1 . An automatic memory management mechanism for 
distributed memories (parallel garbage collector) 

2. An automatic process synchronization mechanism 
based on a dataflow scheme 

3. Various support mechanisms for attaining the best 
job division and load balancing. 

The first two are to be embedded in the language pro- 
cessor. The last is to be provided in a parallel operating 
system. All of these answer the question of how to write 
parallel programs and map them on parallel machines. 

This mapping could be made fully automatic if we 
limited our applications to very regular calculations and 
processing. However, for the applications we intend, the 
mapping process, which includes job division and load- 
balancing, should be done by programmers using the 
functions of the language processor and operating sys- 
tem. 

4.2.2 A general parallel programming environ- 
ment 

Above mechanisms for mapping should be implemented 
in the following three layers: 

1. A parallel hardware system consisting of element 
processors and inter-connection network (PIM hard- 
ware) 

2. A parallel language processor consisting of run-time 
routines, built-in functions, compilers and so on 
(KL1 language processor) 

3. A parallel operating system including a program- 
ming environment (PIMOS) 

At the beginning of the intermediate stage, we tried to 
determine the roles of the hardware, the language pro- 
cessor and the operating system. This was really the 
start of development. 

One idea was to aim at hardware with many functions 
and using high density VLSI technology, as described in 
early papers on dataflow machine research. It was a very 
challenging approach. However, we thought it too risky 
because changes to the logic circuits in VLSI chips would 
have a long turn-around time even if the rapid advance of 
VLSI technology was taken into account. Furthermore, 
we thought it would be difficult to run hundreds of so- 
phisticated element processors for a few days to a few 
weeks without any hardware faults. 

Implementation of the language processor and the op- 
erating system was thought to be very difficult too. As 



there were no prior examples, we could not make any re- 
liable quantitative estimation of the overhead caused by 
these software systems. This implementation was there- 
fore considered risky too. 

Finally, we decided not to make an element proces- 
sor too complex , so that our hardware engineers could 
provide the software researchers with a large-scale hard- 
ware platform stable enough to make the largest-scale 
software experiments in the world. 

However, we tried to add cost-effective hardware sup- 
port for KLl to the element processor, in order to at- 
tain a higher execution speed. We employed tag archi- 
tecture to support the automatic memory management 
mechanism as well as faster execution of KLl programs. 
The automatic synchronization mechanism was to be im- 
plemented in firmware. The supports for job division 
and load balancing were implemented partially by the 
firmware as primitives of the KLl language, but they 
were chiefly implemented by the operating system. In a 
programming environment of the operating system, we 
hoped to provide a semi-automatic load balancing mech- 
anism as an ultimate research goal. 

PIMOS and KLl hide from users most of the archi- 
tectural details of the element processors and network 
system of PIM hardware. A parallel program is modeled 
and programmed depending on a parallel model of an 
application problem and algorithms designed by a pro- 
grammer. The programmer has great freedom in divid- 
ing programs because a KLl program is basically con- 
structed from very fine-grain processes. 

As a second step, the programmer can decide the 
grouping of fine-grain processes in order to obtain an ap- 
propriate granularity as divided jobs, and then specify 
how to dispatch them to element processors using a spe- 
cial notation called “pragma”. This two step approach 
in parallel programming makes it easy and productive. 

We decided to implement the memory management 
mechanism and the synchronization mechanism mainly 
in the firmware. The job division and load balancing 
mechanism was to be implemented in the software. We 
decided not to implement uncertain mechanisms in the 
hardware. 

The role of the hardware system was to provide a sta- 
ble platform with enough element processors, execution 
speed, memory capacity, number of disks and so on. The 
demands made on the capacity of a cache and a main 
memory were much larger than those of a general pur- 
pose microprocessor of that time. The employment of 
tag architecture contributed to the simple implementa- 
tion of the memory management mechanism and also 
increased the speed of KLl program execution. 
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5 R & D in the final stage 

5.1 Planning of the final stage 

At the end of the intermediate stage, an experimen- 
tal medium-scale parallel inference system consisting of 
the multi-PSI system, the KL1 language processor, and 
PIMOS was successfully completed. On this system, 
several small application programs were developed and 
run efficiently in parallel. This proved that symbol and 
knowledge processing problems had sufficient parallelism 
and could be written in KL1 efficiently. This success en- 
abled us to enter the final stage. 

Based on research achievements and newly developed 
tools produced in the intermediate stage, we made a de- 
tailed plan for the final stage. One general target was to 
make a big jump from the hardware and software tech- 
nologies for the multi-PSI system to the ones for the 
PIM, with hundreds of element processors. Another gen- 
eral target was to make a challenge for parallel processing 
of large and complex knowledge processing applications 
which had never been tackled anywhere in the world, 
using KL1 and the PIM. 

Through the research and development directed to 
these targets, we expected that a better parallel pro- 
gramming methodology would be established for logic 
programming. Furthermore, the development of large 
and complex application programs would not only en- 
courage us to create new methods of building more in- 
telligent systems systematically but could also be used 
as practical benchmarking programs for the parallel in- 
ference system. We intended to develop new techniques 
and methodologies. 

1. Efficient parallel software technology 

(a) Parallel modeling and programming techniques 
-Parallel programming paradigms 
-Parallel algorithms 

(b) Efficient mapping techniques of parallel pro- 
cesses to parallel processors 

-Dynamic load balancing techniques 
-Performance debugging support 

2. New methodologies to build intelligent systems us- 
ing the power of the parallel inference system 

(a) Development of a higher-level reasoning or in- 
ference engine and higher-level programming 
languages 

(b) Methodologies for knowledge representation 
and knowledge base management (methodol- 
ogy for knowledge programming) 

The research and development themes in the final stage 
were set up as follows: 



1. PIM hardware development 

We intended to build several models with differ- 
ent architectures so that we could compare map- 
ping problems between the architectures and pro- 
gram models. The number of element processors for 
all the modules was planned about 1,000. 

2. The KL1 language processor for the PIM modules 

We planned to develop new KL1 language processors 
which took the architectural differences on the PIM 
modules into account. 

3. Improvement and extension of PIMOS 

We intended to develop an object-oriented language, 
AYA, over KL1, a parallel file system, and extended 
performance debugging tools for its programming 
environment. 

4. Parallel DBMS and KBMS 

We planned to develop a parallel and distributed 
database management system, using several disk 
drives connected to PIM element processors, was in- 
tended to attain high throughput and consequently 
a high information retrieval speed. As we had al- 
ready developed a data base management system, 
Kappa-II, which employed a nested relational model 
on the PSI machine, we decided to implement a par- 
allel version of Kappa-II. However, we redesiged its 
implementation, employing the distributed database 
model and using KL1. This parallel version is called 
Kappa-P. We plan to develop a knowledge base man- 
agement system on the Kappa-P. This would be 
based on the deductive object-oriented DB, having 
a knowledge representation language, Quixote. 

5. Research on knowledge programming software 

We intended to continue various basic research ac- 
tivities to develop new theories, methodologies and 
tools for building knowledge processing application 
systems. These activities were grouped together as 
research on knowledge programming software. 

This included research themes such as a parallel 
constraint logic programming language, mathemat- 
ical systems including theorem provers, natural lan- 
guage processing systems such as a grammar design 
system, and an intelligent sentence generation sys- 
tem for man-machine interfacing. 

6. Benchmarking and experimental parallel applica- 
tion systems 

To evaluate the parallel inference system and the 
various tools and methodologies developed in the 
above themes, we decided to make more effort to 
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explore new applications of parallel knowledge pro- 
cessing. We began research into a legal expert sys- 
tem, a genetic information processing systems and 
so on. 

5.2 R & D results in the final stage 

The actual research activities into the themes described 
above differed according to characteristics. In the de- 
velopment of the parallel inference system, we focused 
on the integration of PIM hardware and some software 
components. In our research on knowledge programming 
software, we continued basic research and experimental 
software building to create new theories and develop par- 
allel software technologies for the future. 

5.2.1 PIM hardware and KL1 language proces- 
sor 

A role of the PIM hardware was to provide software re- 
searchers with an advanced platform which would allow 
large-scale software development for knowledge process- 
ing. 

Another role was to obtain various evaluation data 
in the architecture and hardware structure of the ele- 
ment processors and network systems. In particular, we 
wanted to analyze the performance of large-scale parallel 
programs on various architectures (machine instruction 
sets) and hardware structures, so that hardware engi- 
neers could design more powerful and cost-effective par- 
allel hardware in the future. 

In the conceptual design of the PIM hardware, we real- 
ized that there were many alternative designs for the ar- 
chitecture of an element processor and the structure of a 
network system. For the architecture of an element pro- 
cessor, we could choose between a CISC type instruction 
set implemented in firmware and a RISC type instruction 
set. On the interconnection network, there were several 
opinions, including a two dimensional mesh network like 
the multi-PSI, a cross-bar switch, and a common bus and 
coherent cache. 

To design the best hardware, we needed to find out the 
mapping relationships between program behavior and 
the hardware architectures and structures. We had to 
establish criteria for the design of the parallel hardware, 
reflecting the algorithms and execution structures of ap- 
plication programs. 

To gather the basic data we needed to obtain this de- 
sign criteria, we tried to categorize our design choices 
into five groups and build five PIM modules. The main 
features of these five modules are listed in Table 2. The 
number of element processor required for each module 
was determined depending on the main purpose of the 
module. Large modules have 256 to 512 element proces- 
sors, and were intended to be used for software experi- 
ments. Small modules have 16 or 20 element processors 



and were built for architectural experiments and evalua- 
tion. 

All of these modules were designed to support KL1 
and PIMOS, so that software researchers could run one 
program on the different modules and compare and an- 
alyze the behaviors of parallel program execution. 

A PIM/m module employed architecture similar to 
the multi-PSI system. Thus, its KL1 language proces- 
sor could be developed by simply modifying and extend- 
ing that of the multi-PSI system. For other modules, 
namely PIM/p, PIM/c, PIM/k, and PIM/i, the KL1 
language processor had to be newly developed because 
all of these modules have a cluster structure. In a clus- 
ter, four to eight element processors were tightly coupled 
by a shared memory and a common bus with coherent 
caches. While communication between element proces- 
sors is done through the common bus and shared mem- 
ory, communication between clusters is done via a packet 
switching network. These four PIM modules have differ- 
ent machine instruction sets. 

We intended to avoid the duplication of development 
work for the KL1 language processor. We used the KL1- 
C language to write PIMOS and the usual application 
programs. A KL1-C program is compiled into the KL1- 
B language, which is similar to the “WAM” as shown 
in Figure 5. We defined an additional layer between 
the KL1-B language and the real machine instruction. 
This layer is called the virtual hardware layer. It has a 
virtual machine instruction set called “PSL”. The spec- 
ification of the KL1-B interpreter is described in PSL. 
This specification is semi- automatically converted to a 
real interpreter or runtime routines dedicated to each 
PIM modules. The specification in PSL is called a vir- 
tual PIM processor (the VPIM processor for short) and 
is common to four PIM modules. 

PIM/p, PIM/m and PIM/c are intended to be used 
for large software experiments; the other modules were 
intended for architectural evaluations. We plan to pro- 
duce a PIM/p with 512 element processors, and a PIM/m 
with 384 element processors. Now, at the beginning of 
March 1992, a PIM/m of 256 processors has just started 
to run a couple of benchmarking programs. 

We aimed at a processing speed of more than 100 
MLIPS for the PIM modules. The PIM/m with 256 pro- 
cessors will attain more than 100 MLIPS as its peak per- 
formance. However, for a practical application program, 
this speed may be much reduced, depending on the char- 
acteristics of the application program and the program- 
ming technique. To obtain better performance, we must 
attempt to augment the effect of compiler optimization 
and to implement a better load balancing scheme. We 
plan to run various benchmarking programs and exper- 
imental application programs to evaluate the gain and 
loss of implemented hardware and software functions. 
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Figure 4: Research Themes in the Final Stage 



Table 2: Features of PIM modules 



Item 


PIM/p 


PIM/c 


PIM/m 


PIM/i 


PIM/k 


Machine instructions 


RISC-type + 
macro instructions 


Horizontal 

microinstructions 


Horizontal 

microinstructions 


RISC- type 


RISC- type 


Target cycle time 


60 nsec 


65 nsec 


50 nsec 


100 nsec 


100 nsec 


LSI devices 


Standard cell 


Gate array 


Ceil base 


Standard cell 


Custom 


Process Technology 
(line width) 


0.96 pm 


0.8 pm 


0.8 pm 


1.2 pm 


1.2 pm 


Machine configuration 


Multicluster 
connections (8 PEs 
linked to a shared 
memory) in a 
hypercube network 


Multicluster 
connections (8 PEs 
+ CC linked to a 
shared memory) 
in a crossbar network 


Two-dimensional 
mesh network 
connections 


Shared memory 
connections 
through a 
parallel cache 


Two-level 
parallel cache 
connections 


Number of PEs connected 


512 PEs 


256 PEs 


256 PEs 


16 PEs 


16 PEs 










Figure 5: KL1 Language Processor and VPIM 




Multiple Hypercube Network 




Clustero' Cluster^ ClusterVs 



• Machine language: KLl-b 

• Architecture of PE and cluster 

- RISC + HLIC(Microprogrammed) 

- Machine cycle: 60ns, Reg.file: 40bits x 32W 

- 4 stage pipeline for RISC inst. 

- Internal Inst. Mem: 50 bits x 8 KW 

- Cache: 64 KB, 256 column, 4 sets, 32B/block 

- Protocol: Write-back, Invalidation 

- Data width: 40 bits/word 
-Shared Memory capacity: 256 MB 

• Max. 512 PEs, 8 PE/cluster and 4 clusters/cabinet 

• Network: 

- Double hyper-cube (Max 6 dimensions) 

- Max. 20MB/sec in each link 



Figure 6: PIM model P: Main Features and Appearance of a Cabinet 
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• Machine language: KLl-b 

• Architecture of PE: 

— Microprogram control (64 bits/word x 32 KW) 

- Data width: 40 bits/word 

— Machine cycle: 60ns, Reg. file: 40 bits x 64W 

- 5 stage pipeline 

- Cache: 1 KW for Inst., 4 KW for Data 

— Memory capacity: 16MW x 40 bits (80 MB) 

• Max. 256 PEs, 32 PE/cabinet 

• Network: 

- 2-dimensional mesh 

- 4.2MB/s x 2 directions/ch 



Figure 7: PIM model M: Main Features and Appearance of four Cabinets 



5.2.2 Development of PIMOS 

PIMOS was intended to be a standard parallel operating 
system for large-scale parallel machines used in symbol 
and knowledge processing. It was designed as an in- 
dependent, self-contained operating system with a pro- 
gramming environment suitable for KL1. Its functions 
for resource management and execution control of user 
programs were designed as independent from the archi- 
tectural details of the PIM hardware. They were imple- 
mented based on an almost completely non- centralized 
management scheme so that the design could be ap- 
plied to a parallel machine with one million element 
processors. [Chikayama 1992] 

PIMOS is completely written in KL1. Its manage- 
ment and control mechanisms are implemented using a 
“meta-call” primitive of KL1. The KL1 language pro- 
cessor has embedded an automatic memory management 
mechanism and a dataflow synchronization mechanism. 
The management and control mechanisms are then im- 
plemented over these two mechanisms. 

The resource management function is used to manage 
the memory resources and processor resources allocated 
to user processes and input and output devices. The pro- 
gram execution control function is used to start and stop 
user processes, control the order of execution following 
priorities given to them, and protect system programs 
from user program bugs like the usual sequential operat- 



ing systems. 

PIMOS supports multiple users, accesses via network 
and so on. It also has an efficient KL1 programming en- 
vironment. This environment has some new tools for de- 
bugging parallel programs such as visualization programs 
which show a programmer the status of load balancing in 
graphical forms, and other monitoring and measurement 
programs. 

5.2.3 Knowledge base management system 

The knowledge base management system consists of two 
layers. The lower layer is a parallel database manage- 
ment system, Kappa-P. Kappa-P is a database manage- 
ment system based on a nested relational model. It is 
more flexible than the usual relational database man- 
agement system in processing data of irregular sizes and 
structures, such as natural language dictionaries and bi- 
ological databases. 

The upper layer is a knowledge base manage- 
ment system based on a deductive object-oriented 
database. [Yokota and Nishio 1989] This provides us 
with a knowledge representation language, Quixote. 
[Yokota and Yasukawa 1992] These upper and lower lay- 
ers are written in KL1 and are now operational on PI- 
MOS. 

The development of the database layer, Kappa, was 
started at the beginning of the intermediate stage. 
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Kappa aimed to manage the “natural databases” accu- 
mulated in society, such as natural language dictionaries. 
It employed a nested relational model so that it could 
easily handle data sets with irregular record sizes and 
nested structures. Kappa is suitable not only for nat- 
ural language dictionaries but also for DNA databases, 
rule databases such as legal data, contract conditions, 
and other “natural databases” produced in our social 
systems. 

The first and second versions of Kappa were developed 
on a PSI machine using the ESP language. The second 
version was completed at the end of the intermediate 
stage, and was called Kappa-II.[Yokota et al. 1988] 

In the final stage, a parallel and distributed imple- 
mentation of Kappa was begun. It is written in KLl 
and is called Kappa-P. Kappa-P is intended to use large 
PIM main memories for implementing the main memory 
database scheme, and to obtain very high throughput 
rate for disk input and output by using many disks con- 
nected in parallel to element processors. 

In conjunction with the development of Kappa- II and 
Kappa-P, research on a knowledge representation lan- 
guage and a knowledge base management system was 
conducted. After repeated experiments in design and im- 
plementation, a deductive object-oriented database was 
employed in this research. 

At this point the design of the knowledge represen- 
tation language, Quixote, was completed. Its language 
processor, which is the knowledge base management sys- 
tem, is under development. This language processor is 
being built over Kappa-P. Using Quixote, construction 
of a knowledge base can then be made continuously from 
a simple database. This will start with the accumulation 
of passive fact data, then gradually add active rule data, 
and will finally become a complete knowledge base. 

The Quixote and Kappa-P system is a new knowl- 
edge base management system which has a high-level 
knowledge representation language and the parallel and 
distributed database management system as the base of 
the language processor. The first versions of Kappa-P 
and Quixote are now almost complete. It is interesting 
to see how this big system operates and how much its 
overhead will be. 

5.2.4 Knowledge programming software 

This software consists of various experimental programs 
and tools built in theoretical research and development 
into some element technologies for knowledge process- 
ing. Most of these programs and tools are written in 
KLl. These could therefore be regarded as application 
programs for the parallel inference system. 

1. Constraint logic programming system 

In the final stage, a parallel constraint logic pro- 
gramming language, GDCC, is being developed. 



This language is a high-level logic language which 
has a constraint solver as a part of its language 
processor. The language processor is implemented 
in KLl and is intended to use parallel processing 
to make its execution time faster. The GDCC 
is evaluated by experimental application programs 
such as a program for designing a simple handling 
robot. [ Aiba and Hasegawa 1992] 

2. Theorem proving and program transformation 

A model generation theorem prover, MGTP, is be- 
ing implemented in KLl. For this application, the 
optimization of load balancing has been made suc- 
cessfully. The power of parallel processing is almost 
proportional to the number of element processors 
being used. This prover is being used as a rule- 
based reasoner for a legal reasoning system. It en- 
ables this system to use knowledge representation 
based on first order logic, and to contribute to easy 
knowledge programming. 

3. Natural language processing 

Software tools and linguistic data bases are being 
developed for use in implementing natural language 
interfaces. The tools integrated into a library called 
a Language Tool Box (LTB). The LTB includes nat- 
ural language parsers, a sentence generators, and the 
linguistic databases and dictionaries including syn- 
tactic rules and so on. 

5.2.5 Benchmarking and experimental parallel 
application software 

This software includes benchmarking programs for the 
parallel inference system, and experimental parallel ap- 
plication programs which were built for developing paral- 
lel programming methodology, knowledge representation 
techniques, higher-level inference mechanisms and so on. 

In the final stage, we extended the application area 
to include larger-scale symbol and knowledge processing 
applications such as genetic information processing and 
legal expert systems. This was in addition to engineering 
applications such as VLSI-CAD systems and diagnostic 
systems for electronic equipment. [Nitta 1992] 

1. VLSI CAD programs 

Several VLSI CAD programs are being developed 
for use in logic simulation, routing, and placement. 
This system is aimed at developing various parallel 
algorithms and load balancing methods. As there 
are sequential programs which have similar func- 
tions to these programs, we can compare the per- 
formance of the PIM against that of conventional 
machines. 
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2. Genetic information processing programs 

Sequence alignment programs for proteins and a 
protein folding simulation program are being devel- 
oped. Research on an integrated database for bio- 
logical data is also being made using Kappa. 

3. A legal reasoning system 

This system infers possible judgments on a crime 
using legal rules and past cases histories. It uses 
the parallel theorem prover, MGTP, as a core of the 
rule-based reasoner. This system is making full use 
of important research results of this project, namely, 
the PIM, PIMOS, MGTP and high-level inference 
and knowledge representation techniques. 

4. A Go game playing system 

The search space of a Go game is too large to apply 
any exhaustive search method. For a human player, 
there are many text books to show typical position 
sequences of putting stones which is called “Joseki” 
patterns. This system has some of the Joseki pat- 
terns and some heuristic rules as its knowledge base 
to win the game against a human player. It aims to 
attain 5 to 10 “kyuu” level. 

The applications we have described all employ symbol 
and knowledge processing. The parallel programs have 
been programmed in KL1 in a short time. Particularly 
for the CAD and sequence alignment programs, the pro- 
cessing speed has improved almost proportionally to the 
number of element processors. 

However, as we can see in the Go playing system, 
which is a very sophisticated program, the power of the 
parallel inference system can not always increase its in- 
telligence effectively. This implies that we cannot effec- 
tively transcribe “natural” knowledge bases written in 
text books on Go into data or rules in “artificial” knowl- 
edge base of the system which would make the system 
“ clever”. We need to make more effort to find out a 
better program structure and better algorithms to make 
full use of the merit of parallel processing. 

6 Evaluation of the parallel in- 
ference system 

6.1 General purpose parallel program- 
ming environment 

The practical problems in symbol and knowledge pro- 
cessing applications have been written efficiently in KL1, 
and solved quickly using a PIM which has several hun- 
dred element processors. Productivity of parallel soft- 
ware using in KL1 has been proved to be much higher 



than in any conventional language. This high productiv- 
ity is apparently a result of using the automatic mem- 
ory management mechanism and the automatic dataflow 
synchronization mechanism. 

Our method of specifying job division and load balanc- 
ing has been evaluated and proved successful. KL1 pro- 
gramming takes a two-step approach. In the first step, a 
programmer writes a program concentrating only on the 
program algorithms and a model. When the program is 
completed, the programmer adds the specifications for 
job division and load balancing using a notation called 
“pragma” as the second step. This separation makes the 
programming work simple and productive. 

The specification of the KL1 language has been evalu- 
ated as practical and adequate for researchers. However, 
we realize that application programmers need a simpler 
and higher-level KL1 language specification which is a 
subset of KL1. In the future, several application-oriented 
KL1 language specifications should be provided, just as 
the von Neumann language set has a variety of different 
languages such as Fotran, Pascal and Cobol. 

6.2 Evaluation of KL1 and PIMOS 

The functions of PIMOS, some of which are implemented 
as KL1 functions, have been proved to be effective for 
running and debugging user programs on parallel hard- 
ware. The resource management and execution mech- 
anisms in particular work as we had expected. For in- 
stance, priority control of user processes permits pro- 
grammers to use about 4,000 priority levels and enables 
them to write various search algorithms and speculative 
computations very easily. We are convinced that the 
KL1 and PIMOS will be the best practical example for 
general purpose parallel operating systems in the future. 

6.3 Evaluation of hardware support 
for language functions 

In designing of the PIM hardware and the KL1 language 
processor, we thought it more important to provide a us- 
able and stable platform which has a sufficient number of 
element processor for parallel software experiments than 
to build many dedicated functions into the element pro- 
cessor. Only the dedicated hardware support built in 
the element processor was tag architecture. Instead, we 
added more support for the interconnection between el- 
ement processors such as message routing hardware and 
a coherent cache chip. 

We did not embed complex hardware support, such as 
a matching store of a dataflow machine, or a content- 
addressable memory. We thought it risky because an 
implementation of the complex hardware would take a 
long turn around time even by a very advanced VLSI 
technology. We also considered that we should create a 
new optimization technique for a compiler dedicated to 
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the embedded complex hardware support, and that this 
would not easy too. 

The completion of PIM hardware is now one year be- 
hind the original schedule, mainly because we had many 
unexpected problems in the design of the random logic 
circuits, and in submicron chip fabrication. If we had 
employed a more complex design for the element pro- 
cessor, the PIM hardware would have been further from 
completion. 

6.3.1 Comparison of PIM hardware with com- 
mercially available technology 

Rapid advances have been made in RISC processors re- 
cently. Furthermore, a few MIMD parallel machines 
which use a RISC processor as their element processor 
have started to appear in the market. When we began 
to design the PIM element processor, the performances 
of both RISC and CISC processors were as low as a few 
MIPS. At that time, a dedicated processor with tag ar- 
chitecture could attain a better performance. However, 
now some RISC processors have attained more than 50 
MIPS. It is interesting to evaluate these RISC processors 
for KL1 program execution speed. 

We usually compare the execution speed of a PIM ele- 
ment processor to that of a general-purpose microproces- 
sor, regarding 1 LIPS as approximately equivalent to 100 
IPS. This means that a 500 KLIPS PIM element proces- 
sor should be comparable to a 50 MIPS microprocessor. 
However, the characteristics of KL1 program execution 
are very different from those of the usual benchmark pro- 
grams for general-purpose microprocessors. 

The locality of memory access patterns for practical 
KL1 programs is lower than for standard programs. As 
the length of the object codes for a RISC instruction 
set has to be longer than a CISC or dedicated instruc- 
tion set processors, the cache miss ratio will be greater. 
Then, simple comparison with the PIM element proces- 
sor and some recent RISC chips using announced peak 
performance is not meaningful. Thus, the practical im- 
plementation of the KL1 language processor on a typical 
RISC processor is necessary. 

Most of the MIMD machines currently on the market 
lack a general parallel programming environment. The 
porting of the KL1 language processor may allow them 
to employ new scientific applications as well as symbol 
and knowledge processing applications. 

In the future processor design, we believe that a gen- 
eral purpose microprocessor should have tag architecture 
support as a part of its standard functions. 

6.3.2 Evaluation of high-level programming 
overhead 

Parallel programming in KL1 is very productive, espe- 
cially for large-scale and complex problems. The control 



of job division and load balancing works well for hun- 
dreds of element processors. No conventional language 
is so productive. However, if we compare the process- 
ing speed of a KL1 program with that of a conventional 
language program with similar functions within a single 
element processor, we find that the KL1 overhead is not 
so small. This is a common trade-off problem between 
high-level programming and low-level programming. 

One straightforward method of compensating is to 
provide a simple subroutine call mechanism to link C 
language programs to KL1 programs. Another method 
is to improve the optimization techniques of compilers. 
This method is more elegant than the first. Further re- 
search on optimization technique should be undertaken. 



7 Conclusion 

It is obvious that a general-purpose parallel program- 
ming language and environment is indispensable for solv- 
ing practical problems of knowledge and symbol process- 
ing. The straightforward extension of conventional von 
Neumann languages will not allow the use of hundreds 
of element processors except for regular scientific calcu- 
lations. 

We anticipated the difficulties in efficient implemen- 
tation of the automatic memory management and syn- 
chronization mechanisms. However, this has been now 
achieved. The productivity and maintainability of KLl is 
much higher than we expected. This more than compen- 
sates for the overhead in high-level language program- 
ming. 

Several experimental parallel application programs on 
the parallel inference system have proved that most 
large-scale knowledge processing applications contain po- 
tential parallelism. However, to make full use of this par- 
allelism, we need to have more parallel algorithms and 
paradigms to actually program the applications. 

The research and development targets of this FGCS 
project have been achieved, especially as regards the par- 
allel inference system. We plan to distribute the KLl 
language processor and PIMOS as free software or pub- 
lic domain software, expecting that they will be ported 
to many MIMD machines, and will provide a research 
platform for future knowledge processing technology. 
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Abstract 

The parallel inference machine, PIM, is the prototype 
hardware system in the Fifth Generation Computer Sys- 
tems (FGCS) project. The PIM system aims at estab- 
lishing the basic technologies for large-scale parallel ma- 
chine architecture, efficient kernel language implementa- 
tion and many aspects of parallel software, that must 
be required for high performance knowledge information 
processing in the 21st century. The PIM system also 
supports an R & D environment for parallel software, 
which must extract the full power of the PIM hardware. 

The parallel inference machine PIM is a large-scale 
parallel machine with a distributed memory structure. 
The PIM is designed to execute a concurrent logic pro- 
gramming language very efficiently. The features of the 
concurrent logic language, its implementation, and the 
machine architecture are suitable not only for knowl- 
edge processing, but also for more general large prob- 
lems that arise dynamic and non-uniform computation. 
Those problems have not been covered by commercial 
parallel machines and their software systems targeting 
scientific computation. The PIM system focuses on this 
new domain of parallel processing. 

There are two purposes to this paper. One is to report 
an overview of the research and development of the PIM 
hardware and its language system. The other is to clarify 
and itemize the features and advantages of the language, 
its implementation and the hardware structure with the 
view that the features are strong and indispensable for 
efficient parallel processing of large problems with dy- 
namic and non-uniform computation. 



1 Introduction 

The Fifth Generation Computer Systems (FGCS) 
project aims at establishing basic software and hardware 
technologies that will be needed for high-performance 
knowledge information processing in the 21st century. 
The parallel inference machine PIM is the prototype 
hardware system and offers gigantic computation power 
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Figure 1: Overview of the PIM System 



to the knowledge information processing. The PIM sys- 
tem includes an efficient language implementation of 
KL1, which is the kernel language and a unique inter- 
face between hardware and software. 

Logic programming was chosen as the common basis of 
research and development for the project. The primary 
working hypothesis was as follows. “Many problems of 
future computing, such as execution efficiency (of paral- 
lel processing), descriptive power of languages, software 
productivity, etc., will be solved drammatically with the 
total reconstruction of those technologies based on logic 
programming. 

Following the working hypothesis, R & D on the PIM 
system started from scratch with the construction of 
hardware, a system software, a language system, appli- 
cation software and programming paradigms, all based 
on logic programming. Figure 1 gives an overview of the 
system structure. 

The kernel language KL1 was firstly designed for ef- 
ficient concurrent programming and parallel execution 
of knowledge processing problems. Then, R & D on the 
PIM hardware with distributed-memory MIMD architec- 
ture and the KL1 language implementation on it were 
carried out, both aiming at efficient KL1 execution in 
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parallel. A machine roughly with 1000 processors was 
primarily targeted. Each of these processors was to be a 
high-speed processor with hardware support for symbolic 
processing. The PIM system also focused on realizing a 
useful R & D environment for parallel software which 
could extract the real computing power of the PIM. The 
preparation of a good R & D environment was an im- 
portant project policy. 

KL1 is a concurrent logic programming language pri- 
marily targeting knowledge processing. Since the lan- 
guage had to be a common basis for various types of 
knowledge processing, it became a general-purpose con- 
current language suitable for symbolic processing, with- 
out shifting to a specific reasoning mechanism or a cer- 
tain knowledge representation paradigm. 

Our R & D led to the language features of KL1 being 
very suitable for covering the dynamic and non-uniform 
large problems that are not covered by commercial par- 
allel computers and their software systems for scientific 
computation. Most knowledge processing problems are 
included in the problem domain of dynamic and non- 
uniform computation. The PIM hardware and the KL1 
language implementation support the efficiency of the 
language features. Thus, the PIM system covers this 
new domain of parallel processing. 

This paper focuses on two subjects. One is the R & D 
report of the PIM hardware and the KL1 language imple- 
mentation on it. The other is to clarify and itemize the 
features and advantages of the language, its implementa- 
tion and the hardware structure with the view that the 
features are strong and indispensable for efficient paral- 
lel processing of large problems with dynamic and non- 
uniform computation. Any parallel processing system 
targeting this problem domain must consider those fea- 
tures. 

Section 2 scans the R & D history of parallel process- 
ing systems in the FGCS project, with explanation of 
some of the keywords. Section 3 characterizes the PIM 
system. Many advantageous features of the language, its 
parallel implementation and hardware structure are de- 
scribed with the view that the features are strong and 
indispensable for efficient programming and execution of 
the dynamic and non-uniform large problems. Section 
4 presents the machine architecture of PIM. Five differ- 
ent models have been developed for both research use 
and actual software development. Some hardware spec- 
ifications are also reported. Section 5 briefly describes 
the language implementation methods and techniques, 
to give a concrete image of several key features of the 
KL1 implementation. Section 6 reports some measure- 
ments and evaluation mainly focusing on a low-cost im- 
plementation of small-grain concurrent processes and re- 
mote synchronization, which support the advantageous 
features of KLl. Overall efficiency, as demonstrated by 
a few benchmark programs, is shown, including the most 
recent measurements on PIM/m. Then, section 7 con- 



cludes this paper. 

Several important research issues of parallel software 
are reported in other papers: the parallel operating sys- 
tem PIMOS is reported in [Chikayama 1992] and the 
load balancing techniques controlled by software are re- 
ported in [Nitta et al. 1992], 

2 R & D History 

This section shows the R & D history of parallel pro- 
cessing systems in the FGCS project. Important re- 
search items and products of the R & D are described 
briefly, with explanations of several keywords. There 
are related reports for further information [Uchida 1992] 
[Uchida et al. 1988]. 

2.1 Start of the Mainstream of R & D 

Mainstream of R & D of the parallel processing systems 
started at the beginning of the intermediate stage of the 
FGCS project, in 1985. Just before that time, a concur- 
rent logic language GHC [Ueda 1986] had been designed, 
which was chosen as the kernel language of the R & D. 
Language features will be described in section 3.4. 

Development of small hardware and software systems 
was started based on the kernel language GHC as a hard- 
ware and software interface. The hardware system was 
used as a testbed of parallel software research. Experi- 
ences and evaluation results was fed back to the next R 
& D of larger hardware and software system, which was 
the bootstrapping of R & D. 

It was started from development of the Multi-PSI 
[Taki 1988]. Purpose of the hardware development was 
not only the architectural research of a knowledge pro- 
cessing hardware, but also a preparation of a testbed for 
efficient language implementation of the kernel language. 
The Multi-PSI also focused to be a useful tool and envi- 
ronment of parallel software research and development. 
That is, the hardware was not just an experimental ma- 
chine, but a reliable system being developed in short 
period, with measurements and debugging facilities for 
software development. After construction of the Multi- 
PSI/V1 and /V2 with language implementations, various 
parallel programs and technology and knowhow of par- 
allel software have been accumulated [Nitta et al. 1992] 
[Chikayama 1992]. The systems have been used for the 
advanced software development environment for the par- 
allel inference machines. 

2.2 Multi-PSI/ VI" 

The first hardware was the Multi-PSI/Vl [Taki 1988] 
[Masuda et al. 1988], started in operation in spring 
1986. The personal sequential inference machine PSI 
[Taki et al. 1984] was used for processing elements. It 
was a development result of the initial stage of the 
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project. Six P SI machines were connected by a mesh net- 
work, which supported so called wormhole routing. The 
first distributed implementation of GHC was built on 
it [Ichiyoshi et al. 1987]. (Distributed implementation 
means a parallel implementation on a distributed mem- 
ory hardware). Execution speed was slow (IK LIPS = 
logical inference per second) because an interpreter sys- 
tem was written in ESP (the system description language 
of the PSI). However, basic algorithms and techniques of 
distributed implementation of GHC was investigated in 
it. Several small parallel programs were written and exe- 
cuted on it for evaluation, and primary experimentations 
of load balancing were also carried out. 

2.3 From GHC To KL1 

Since GHC had only basic functions that the kernel 
concurrent logic language had to support, language ex- 
tensions were needed for the next more practical sys- 
tem. Kernel language KLl was designed with considera- 
tions of execution efficiency, operating system supports, 
and some built-in functions [Ueda and Chikayama 1990] 
[Chikayama 1992], An intermediate language KL1-B, 
which was the target language of KLl compiler, was also 
designed [Kimura and Chikayama 1987]. In the Multi- 
PSI/V2 and a PIM model, binary code of KL1-B is di- 
rectly interpreted by microprogram; that is, KLl-B is 
machine language itself. In the other PIM models, KL1- 
B code is converted to lower-level machine instruction 
sequences and executed by hardware. 

2.4 Multi-PSI/V2 

The second hardware system was the Multi-PSI/V2 
[Takeda et al. 1988] [Nakajima 1992], which was im- 
proved in performance and functions enough to be called 
as the first experimental parallel inference machine. It 
started in operation in 1988 and was demonstrated in 
the FGCS’88 international conference. 

The Multi-PSI/V2 included 64 processors, each 
of which were equivalent to the CPU of PSI- 
II [Nakashima and Nakajima 1987], smaller and faster 
model of the PSI. Processors were connected with two 
dimensional mesh network with improved speed (10M 
Bytes/s, full duplex in each channel). KLl-B was the 
machine language of the system, executed by micropro- 
gram. Almost all the runtime functions of KLl was 
implemented in microprogram. The KLl implemen- 
tation was improved much in execution efficiency, re- 
ducing inter-processor communication messages, efficient 
garbage collections, etc. compared with Multi-PSI/Vl. 
It attained 130K LIPS (in KLl append) in single pro- 
cessor speed. Table 1 to 4 include specifications of the 
Multi-PSI/V2. Since 1988, more than 15 systems, large 
system with 64 processors and small with 32 or 16 pro- 
cessors, have been in operation for parallel software R & 



D in ICOT and in cooperating companies. 

A strong simulator of the Multi-PSI/V2 was also de- 
veloped for software development environment. It was 
called the pseudo Multi-PSI, available on the Prolog 
workstation, PSI-II. A very special feature was caused 
by similarity of the PSI-II CPU and processing element 
of the Multi-PSI/V2. Usually, PSI-II executed ESP lan- 
guage with dedicated microprogram. However, it loaded 
KLl microprogram dynamically at the activation of the 
simulator system. The simulator executed KLl programs 
as similar speed as that of the Multi-PSI/V2 single pro- 
cessor. Since the PIMOS could be also executed on the 
simulator, programmers could use the simulator as sim : 
ilar environment as the real Multi-PSI/V2, except for 
speedup with multiple processors and process schedul- 
ing. The pseudo Multi-PSI was the valuable system for 
initial debugging of KLl programs. 



2.5 Software Development on the 
Multi-PSI/V2 

Parallel operating system PIMOS (the first version) and 
four small application programs (benchmark programs) 
[Ichiyoshi 1989] had been developed until FGCS’88. 
Much efforts was paid in PIMOS development to real- 
ize a good environment of programming, debugging, ex- 
ecution and measurements of parallel programs. In the 
development of small application programs, several im- 
portant research topics of parallel software were inves- 
tigated, such as concurrent algorithms with large con- 
currency without increase of complexity, programming 
paradigms and techniques of efficient KLl programs, and 
dynamic and static load balancing schemes for dynamic 
and non-uniform computation. 

The PIMOS has been improved in several versions, 
and ported to the PIM until 1992. The small appli- 
cation programs, pentomino [Furuichi et al. 1990], best- 
path [Wada and Ichiyoshi 1990], PAX (natural language 
parser) and tsume-go (a board game) were improved, 
measured and analyzed until 1989. They are still used 
as test and benchmark programs on the PIM. 

These development gave observations that the KLl 
system on the Multi-PSI/V2 with PIMOS has reached 
sufficient performance level for practical usage, and has 
realized sufficient functions for describing complex con- 
current programs and for experimentations of software- 
controlled load balancing. 

Several large-scale parallel application programs have 
been developed from late 1989 [Nitta et al. 1992] and 
still continuing. Some of them have been ported to the 
PIM. 
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2.6 Parallel Inference Machine PIM 

2.6.1 Five PIM Models 

Design of the parallel inference machine PIM was started 
in concurrent with manufacturing of the Multi-PSI/V2. 
Some research items in hardware architecture were omit- 
ted in the development of the Multi-PSI/V2, because of 
short development time needed for starting the parallel 
software development. So, PIM took a greedy R & D 
plan, focusing both the architectural research and real- 
ization of software development environment. 

The first trial to the novel architecture was the multi- 
ple clusters. A small number of tightly- coupled proces- 
sors with shared-memory formed a cluster. Many clus- 
ters were connected with high speed network to construct 
the PIM system with several hundred processors. Bene- 
fits of the architecture will be discussed in section 3.7. 

Many component technologies had to be developed 
or improved to realize the new system, such as parallel 
cache memory suitable for frequent inter-processor com- 
munications, high speed processors for symbolic process- 
ing, improvement of the network, etc. For R & D of 
better component technologies and their combinations, 
the development plan of five PIM models was made, so 
that different component architecture and their combi- 
nations could be investigated with assigning independent 
research topics or roll on each model. 

Two models, PIM/p [Kumon et al. 1992] and PIM/c 
[Nakagawa et al. 1992], took the multi-cluster structure. 
They include several hundreds processors, maximum 512 
in PIM/p and 256 in PIM/c. They were developed both 
for the architectural research and software R & D. Each 
investigated different network architecture and processor 
structure. 

The other two models, PIM/k [Sakai et al. 1991] and 
PIM/i [Sato et al. 1992], were developed for the exper- 
imental use of intra-cluster architecture. Two-layered 
coherent cache memory which enabled larger number of 
processors in a cluster, broadcast-typed coherent cache 
memory, and a processor with LlW-type instruction set 
were tested. 

The other model, PIM/m [Nakashima et al. 1992], did 
not take the multi-cluster structure, but focused the rigid 
compatibility with the Multi-PSI/V2, having improved 
processor speed and larger number of processors. The 
maximum number of processors will be 256. The perfor- 
mance of a processor will be four to five times larger at 
peek speed, and 1.5 to 2.5 times larger in average than 
the Multi-PSI/V2. The processor was similar to the CPU 
of PSI-UX, the most recent version of the PSI machine. 
A simulator, pseudo-PIM/m, was also prepared like the 
pseudo Multi-PSI. The PIM/m targeted the parallel soft- 
ware development machine mostly among the models. 

Architecture and specifications of each model will be 
reported in section 4. 

Experimental implementations of some LSIs of these 



models have started in 1989. The final design was al- 
most fixed in 1990, and manufacturing of whole system 
was proceeded with in 1991. From 1991 to spring 1992, 
assembly and test of the five models have carried on. 

2.6.2 Software Compatibility 

KL1 language is common among all the five PIM mod- 
els. Except for execution efficiency, any KL1 programs 
including PIMOS can run on the all models. Hardware 
architecture is different between two groups, Multi-PSI 
and PIM/m as the one, and the other PIM models as 
the other. However, from programmers’ view, abstract 
architecture are designed similar as follows. 

The load allocation to processors are fully controlled 
by programs on the Multi-PSI and the PIM/m. It is 
sometimes written by programmers directly, and some- 
times specified by load allocation libraries. Programmers 
are often researchers of load balancing techniques. On 
the other hand, load balancing in a cluster is completely 
controlled by the KL1 runtime system (not by KL1 pro- 
grams) among the PIM models with the multi-cluster 
structure. That is, programmers does not have to think 
of multiple processors in a cluster, but specify load allo- 
cation to each cluster in their programs. It means that 
a processor of the Multi-PSI or PIM/m corresponds to a 
cluster of the PIM models with the multi-cluster struc- 
ture, which simplifies portation of KL1 programs. 

2.7 KL1 Implementation for PIM 

KL1 system must be the first regular system in the world 
which can execute large-scale parallel symbolic process- 
ing programs very efficiently. Execution mechanisms or 
algorithms of KL1 language had been developed for dis- 
tributed memory architectures sufficiently on the Multi- 
PSI/V2. Some mechanisms and algorithms should be 
expanded for the multi-cluster architecture of PIM. Ease 
of porting the KL1 system to four different PIM mod- 
els was also considered in the language implementation 
method. Only the PIM/m inherited the KL1 implemen- 
tation method directly from the Multi-PSI/V2. 

To expand the execution mechanisms or algorithms 
suitable for the multi-cluster architecture, several tech- 
nical topics were focused, such as avoiding data up- 
date contentions among processors in a cluster, auto- 
matic load balancing in a cluster, expansion of an inter- 
cluster message protocol applicable for the message out- 
stripping, parallel garbage collection in a cluster, etc. 
[Hirata et al. 1992]. 

For easiness of porting the KL1 system to four differ- 
ent PIM models, a common specification of KLl system 
“VPIM (virtual PIM)” was written in “C”-like descrip- 
tion language “PSL”, targeting a common virtual hard- 
ware. VPIM was the executable specification of KLl ex- 
ecution algorithms, which was translated to C language 
and executed to examine the algorithms. VPIM has been 
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translated to lower-level machine languages or micropro- 
grams automatically or by hands according to each PIM 
structure. 

Preparation of the description language started in 
1988. Study of efficient execution mechanisms and al- 
gorithms continued until 1991, then, VPIM was com- 
pleted. Porting the VPIM to four PIM models partially 
started in autumn 1990, and continued to spring 1992. 
Now, the KL1 system with PIMOS is available on each 
PIM model. On the other hand, KL1 system on the 
PIM/m, which was implemented in microprogram, was 
made from conversion of Multi-PSI/V2 microprogram by 
hands or partially in automatic translation. Prior to the 
other PIM models, PIM/m started in operation with the 
KL1 system and PIMOS in summer 1991. 

2.8 Performance and System Evalua- 
tion 

Measurements, analysis, and evaluation should be done 
on various levels of the system shown below. 

1. Hardware architecture and implementations 

2. Execution mechanisms or algorithms of KL1 imple- 
mentation 

3. Concurrent algorithms of applications (algorithms 
for problem solving, independent from mapping) 
and their implementations 

4. Mapping (load allocation) algorithms 

5. Total system performance of a certain application 
program on a certain system 

Various works have been 

done on the Multi-PSI/V2. 1 and 2 were reported in 
[Masuda et al. 1988] and [Nakajima 1992]. 3 to 5 were 
reported in [Nitta et al. 1992], [Furuichi et al. 1990], 
[Ichiyoshi 1989] and [Wada and Ichiyoshi 1990]. 

Primary measurements have just started on each PIM 
models. Some intermediate results are included in 
[Nakashima et al. 1992] and [Kumon et al. 1992]. 

Total evaluation of the PIM system will be done in the 
near future, however, some observations and discussions 
are included in section 6. 

3 Characterizing the PIM and 
KLl system 

PIM and KLl system have many advantageous features 
for very efficient parallel execution of large-scale knowl- 
edge processing which often shows very dynamic runtime 
characteristics and non-uniform computation, much dif- 
ferent from numerical applications on vector processors 
and SIMD machines. 



This section clarifies the characteristics of the targeted 
problem domain shortly, and describes the various ad- 
vantageous features of PIM and KLl system, that are 
dedicated for the efficient programming and processing 
in the problem domain. They will give the total system 
image and help to clarify the difference and similarity 
of the system with other large-scale multiprocessors, re- 
cently available in the market. 

3.1 Summary of Features 

The total image of PIM and KLl system are briefly 
scanned as follows. Detailed features and their bene- 
fits, and reasons why they were chosen are presented in 
the following sections. 

Distributed memory MIMD machine: 

Global structure of the PIM is the distributed mem- 
ory MIMD machine in which hundreds computation 
nodes are connected by highspeed network. Scala- 
bility and ease of implementations are focused. Each 
computation node includes single processor or sev- 
eral tightly-coupled processors, and large memory. 
Processors are dedicated for efficient symbolic pro- 
cessing. 

Logic programming language: The kernel language 
KLl is a concurrent logic programming language, 
which is single language for system and application 
descriptions. Language implementation and hard- 
ware design are based on the language specification. 

KLl is not a high-level knowledge representation 
language nor a language for certain type of rea- 
soning, but a general-purpose language for concur- 
rent and parallel programming, especially suitable 
for symbolic computations. 

KLl has many beneficial features to write parallel 
programs in those application domains, described 
below. 

Application domain: Primary applications are large- 
scale knowledge processing and symbolic computa- 
tion. However, large numerical computation with 
dynamic features, or with non-uniform data and 
non-uniform computation (non-data-parallel com- 
putation) are also targeted. 

Language implementation: One KLl system is im- 
plemented on a distributed memory hardware, 
which is not a collection of many KLl systems 
implemented on each processing node. A global 
name space is supported for code, logical variables, 
etc. Communication messages between computa- 
tion nodes are handled implicitly in KLl system, 
not by KLl programs. An efficient implementation 
for small-grain concurrent processes is taken. 
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These implementations focus to realize the benefi- 
cial features of KL1 language for the application do- 
mains described before. 

Policy of load balancing: Load balancing between 
computation nodes should be controlled by KL1 pro- 
grams, not by hardware nor by the language sys- 
tem automatically. Language system has to support 
enough functions and efficiency for the experiments 
of various loadbalancing schemes with software. 

3.2 Basic Choices 

(1) Logic programming: The first choice was to 
adopt logic programming as the basis of the ker- 
nel language. The decision is mainly due to the 
insights of ICOT founders, who expected that logic 
programming was suitable for both knowledge pro- 
cessing and parallel processing. A history, from 
vague expectations on logic programming to the 
concrete design of the KL1 language, is explained 
in [Chikayama 1992]. 

(2) Middle-out approach: A middle-out approach of 
R & D was taken, placing the KL1 language as the 
central layer. Based on the language specification, 
design of the hardware and the language implemen- 
tation started downward, and writing the PIMOS 
operating system and parallel software started up- 
ward. 

(3) MIMD machine: The other choices concerned 
with basic hardware architecture. 

Dataflow architecture before mid 1980 was con- 
sidered not providing enough performance against 
hardware costs, according to observations for re- 
search results in initial stage of the project. 

SIMD architecture seemed inefficient on applica- 
tions with dynamic characteristics or low data- 
parallelism that are often seen in knowledge pro- 
cessing. 

MIMD architecture remained without major demer- 
its and was most attractive from the viewpoint of 
ease of implementation with standard components. 

(4) Distributed memory structure: Distributed 
memory structure is suitable to construct very large 
system, and easy to implement. 

Recent large-scale shared memory machines with 
directory-based cache coherency mechanisms claims 
good scalability. However, when the block size 
(the coherency management unit) is large, the inter- 
processor communication with frequent small data 
transfer seems inefficient. KL1 programs require the 
frequent small data transfer. When the block size 



becomes small, large directory memory is needed, 
which increases the hardware cost. 

Single assignment languages need special memory 
management such as dynamic memory allocation 
and garbage collection. These management should 
be done as locally as possible for the sake of effi- 
ciency. Local garbage collection requires separation 
of local and global address spaces with some indirect 
referencing mechanism or address translation, even 
in a scalable shared memory architecture. Merits of 
the low-cost communication in the shared memory 
architecture decrease significantly for such the case. 

These are the reasons to choose the distributed 
memory structure. 

3.3 Characterizing the Applications 

(1) Characterization: Characteristics of knowledge 
processing and symbolic computation are often 
much different from those of numerical computation 
on vector processors and SIMD machines. Prob- 
lem formalizations for those machines usually based 
on data-parallelism, parallelism for regular compu- 
tation on uniform data. 

However, the characteristics of knowledge and sym- 
bolic computations on parallel machines tend to 
be very dynamic and non-uniform. Contents and 
amount of computation vary dynamically depend- 
ing on time and space. For example, when a heuris- 
tic search problem is mapped on a parallel machine, 
workload of each computation node changes dras- 
tically depending on expansion and pruning of the 
search tree. Also, when a knowledge processing sys- 
tem is constructed from many heterogeneous ob- 
jects, each object arises non-uniform computation. 
Computation loads of these problems are hardly es- 
timated before execution. 

Some classes of large numerical computation with- 
out data-parallelism also show the dynamic and 
non-uniform characteristics. 

Those problems which has dynamism and non- 
uniformity of computation are called the dynamic 
and non-uniform problems in this paper, implying 
not only the knowledge processing and symbolic 
computation but also the large numerical compu- 
tation without data-parallelism. 

The dynamic and non-uniform problems tends to 
include the programs with more complex program 
structure than the data-parallel problems. 

(2) Requirements for the system: Most of the soft- 
ware systems on recent commercial MIMD ma- 
chines with hundreds of processors target the data- 
parallel computation, but they almost don’t care 
other paradigms. 
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The dynamic and non-uniform problems arise new 
requirements mainly on software systems and a few 
on hardware systems, which are listed below. 

1. Descriptive power for complex concurrent pro- 
grams 

2. Easy to remove bugs 

3. Ease of dynamic load balancing 

4. Flexibility for changing the load allocation and 
scheduling schemes to cope with difficulty on 
estimating actual computation loads before ex- 
ecution 

3.4 Characterizing the Language 

This subsection itemizes several advantageous features of 
KL1 that satisfy the requirements listed in the previous 
section. Features and characteristics of the concurrent 
logic programming language KL1 are described in detail 
in [Chikayama 1992]. 

The first three features have been in GHC, the basic 
specifications of KL1. These features make descriptive 
power of the language large enough to write complex con- 
current programs. They are the features of concurrent 
programming to describe logical concurrency, indepen- 
dent from mapping to actual processors. 

(1) Dataflow synchronization: Communication and 
synchronization between KL1 processes are per- 
formed implicitly at all within a framework of usual 
unification. It is based on the dataflow model. Im- 
plicitness is available even in a remote synchroniza- 
tion. The feature drastically reduces bugs of syn- 
chronization and communication compared with the 
case of explicit description using separate primitives. 
The single-assignment property of logic variables 
supports the feature. 

(2) Small-grain concurrent processes: The unit of 
concurrent execution in KL1 is each body goal of 
clauses, which can be regarded as a process invoca- 
tion. KLl programs can thus involve a large amount 
of concurrency implicitly. 

(3) Indeterminacy: A goal (or process) can test and 
wait for the instantiation of multiple variables con- 
currently. The first instantiation resumes the goal 
execution, and when a clause is committed (selected 
from clauses that succeed to execute guard goals), 
the other wait conditions are thrown away. This 
function is valuable to describe “non-rigid” process- 
ing within a framework of side-effect free language. 
Speculative computation can be dealt with, and dy- 
namic load distribution can be also written. 

The next features have been included in KLl as exten- 
sions to GHC. (4) was introduced to describe mapping 



(load allocation) and scheduling. They are the features 
for parallel programming to control actual parallelism 
among processing nodes. (5) is prepared for operating 
system supports. (6) is for the efficiency of practical 
programs. 

(4) Pragma: Pragma is a notation to specify goal allo- 
cation to processing nodes or specify execution pri- 
ority of goals. Pragma doesn’t affect the semantics 
of a program, but controls parallelism and efficiency 
of actual parallel execution. Pragmas are usually at- 
tached to goals after making sure that the program 
is correct anyway. It can be changed very easily, 
because it is syntactically separated from the cor- 
rectness aspect of a program. 

Pragma for load allocation: Goal allocation is 

specified with a pragma, Onode(X). X can be calcu- 
lated in programs. Coupled with (1) and (2), the 
load allocation pragma can realize very flexible load 
allocation. Also coupled with (3) and the pragma, 
KLl can describe a dynamic load balancing program 
within a framework of the pure logic programming 
language without side-effect. Dynamic load balanc- 
ing programs are hard to be written in pure func- 
tional languages without indeterminacy. 

Pragma for execution priority: Execution pri- 

ority is specified with a pragma, @priority(Y). More 
than thousands priority levels are supported to con- 
trol goal scheduling in detail, without rigid ordering. 

Combination of (3) and the priority pragma realizes 
the efficient control of speculative computations. 
Large number of priority levels can be utilized in 
e.g. parallel heuristic search to expand good branch 
of the search tree at first. 

(5) Shoen function (meta-control for goal group) 

The shoen function is designed to handle a set of 
goals as a task, a unit of execution and resource 
management. It is mainly used in PIMOS. Start, 
stop and abortion of tasks can be controlled. Limit 
of resource consumption can be specified. When er- 
rors or exception conditions occur, the status are 
frozen and reported outside the shoen. 

(6) Functions for efficiency: KLl has several built- 
in functions or data types whose semantics is un- 
derstood within the framework of GHC but which 
has been provided for the sake of efficiency. Those 
functions hide demerits of side-effect free languages, 
and also avoid an increase of computational com- 
plexity compared with sequential programs. 
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3.5 Characterizing the Language Im- 
plementation 

Language features, just described in the previous section, 
satisfy the requirements for a system by the dynamic and 
non-uniform problems discussed in section 3.3. Most of 
special features of the language implementation focused 
to enlarge those advantageous features of KL1 language. 

(1) Implicit communication: 

Communication and synchronization among concur- 
rent processes are implicitly done by unifications on 
shared logical variables. They are supported both 
in a computation node and between nodes. It is es- 
pecially beneficial that a remote synchronization is 
done implicitly as well as local. 

A process (goal) can migrate between computation 
nodes only being attached a pragma, @node(X). 
When the process has reference pointers, remote ref- 
erences are generated implicitly between the compu- 
tation nodes. The remote references are used for the 
remote synchronizations or communications. 

These functions hide the distributed memory hard- 
ware from the “concurrent programming”. That is, 
programmers can design concurrent processes and 
their communications, independent from their al- 
locations to a same computation node or different 
nodes. Only the “parallel programming” with prag- 
mas, a design of load allocation and scheduling, has 
to concern with hardware structure and network 
topology. 

Implementation features of those functions are sum- 
marized below, including the features for efficiency. 

• Global name space on a distributed memory 
hardware — in which implicit pointer manage- 
ment among computation nodes are supported 
for logical variables, structured data and pro- 
gram code 

• Implicit data transfer caused by unifications 
and goal (process) migration 

• Implicit message sending and receiving invoked 
with data transfer and goal sending, including 
message composition and decomposition 

• Message protocols able to reduce the number 
of messages, and also protocols applicable to 
message outstripping 

(2) Small-grain concurrent processes: Efficient im- 
plementation of small-grain concurrent processes are 
realized, coupled with low-cost communications and 
synchronizations among them. 

Process scheduling with low-cost suspension and re- 
sumption, and priority management are supported. 



Efficient implementation allows actual use of a lot 
of small-grain processes to realize large concurrency. 
A large number of processes also gives flexibility for 
the mapping and load balancing. 

Automatic load balancing in a cluster is also sup- 
ported. It is a process (goal) scheduling function in 
a cluster implemented with priority management. 
The feature hides multiprocessors in a cluster from 
programmers. They do not have to think about 
load allocation in a cluster, but only have to pre- 
pare enough concurrency. 

(3) Memory management: These garbage collection 
mechanisms are supported. 

• Combination of incremental garbage collection 
with subset of reference counting and stop-and- 
collect copying garbage collection 

• Incremental releasing of remote reference 
pointers between computation nodes with 
weighted reference counting scheme 

Dynamic memory management including garbage 
collections looks essential both for symbolic process- 
ing and for parallel processing of the dynamic and 
non-uniform problems. Because the single assign- 
ment feature, strongly needed for the problems, re- 
quires dynamic memory allocation and reclamation. 

Efficiency of garbage collectors is one of key features 
for practical language system of parallel symbolic 
processing. 

(4) Implementation of shoen function: Shoen rep- 
resents a group of goals (processes) as presented in 
the previous subsection. Shoen mechanism is im- 
plemented not only in a computation node but also 
among nodes. Namely, processes in a task can be 
distributed among computation nodes, and still con- 
trolled all together with shoen functions. 

(5) Built-in functions for efficiency: Several built- 
in functions and data types are implemented to keep 
up with the efficiency of sequential languages. 

(6) Including OS kernel functions: Figure 2 shows 
the relation of KLl implementation and operating 
system functions. KLl implementation includes so 
called OS kernel functions such as memory manage- 
ment, process management and scheduling, commu- 
nication and synchronization, virtual single name 
space, message composition and decomposition, etc. 
While, PIMOS includes upper OS functions like pro- 
gramming environment and user interface. 

The reason why the OS kernel functions are included 
in the KLl implementation is that the implementa- 
tion needs to use those functions with as light cost 
as possible. Cost of those functions affect the actual 
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execution efficiency of the advantageous features of 
KL1 language, such as large number of small-grain 
concurrent processes, implicit synchronization and 
communication among them (even between remote 
processes), indeterminacy, scheduling control with 
large number of priority levels, process migration 
specified with pragmas, etc. Those features are 
indispensable for concurrent and parallel program- 
ming and efficient parallel execution of large-scale 
symbolic computation with dynamic characteristics, 
or large-scale non-data-parallel numerical computa- 
tions. 

Considering a construction of similar purpose par- 
allel processing system on a standard operating sys- 
tem, interface level to the OS kernel may be too high 
(or may arise too much overhead). Some reconstruc- 
tion of OS implementation layers might be needed 
for the standard parallel operating systems for those 
large-scale computation with dynamic characteris- 
tics. 

3.6 Policy of Load Balancing 

Such a basic policy has been taken that load balancing 
between computation nodes should be completely con- 
trolled by KL1 programs, not by hardware nor by lan- 
guage system automatically. There are two reasons. 

One is that KL1 can describe load balancing programs 
within usual logic programming features. Since many 
research topics on load distribution have been remained 
unsolved especially on dynamic problems, experiments 
on software controlled load balancing is advantageous 
in an aspect of flexibility. It does not include significant 
overhead because the KL1 language system realize a very 
low-cost implementation. 

The other is that distributed memory architecture 



needs strong locality of computation, for which some pro- 
grammers’ help is important for better load balancing. 

Language system has to support enough functions and 
efficiency for the experiments of various load balancing 
schemes by software. 

Some load balancing schemes are prepared as utility 
programs, available for application programmers. 

3.7 Characterizing the Hardware Ar- 
chitecture 

Features of PIM hardware architecture are listed below. 
Some of them are specialized for symbolic processing and 
large-scale parallel computation of dynamic problems, 
and some of them are standard. 

(1) Distributed memory MIMD machine: 

Target hardware is the large-scale MIMD machine 
with distributed memory structure. Hundreds pro- 
cessing nodes are connected by highspeed network. 
It was a basic choice of the R & D. The structure 
was considered to have large scalability, to be mostly 
easy for implementation, and to be suitable to sep- 
arate local garbage collections and global. 

(2) Cluster structure: Eight processors, that are 
tightly coupled with shared bus and shared mem- 
ory, form a cluster. Many clusters are connected 
with highspeed network to form the total system. 
Programmers deal with a cluster as a computation 
node with large computation power and large mem- 
ory, since automatic load balancing is supported by 
language system within a cluster. 

Cluster is a substructure of the PIM, realizing a 
low latency and high bandwidth connection between 
processors. There are two major advantages of 
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the cluster structure. The first is its applicability 
to those problems which have less locality, while 
distributed memory architecture hardly processes 
those problems efficiently. The second is higher ef- 
ficiency of memory usage compared with full dis- 
tributed memory systems with the same memory 
size. A substructure with higher bandwidth inter- 
processor connection is effective to reduce needs of 
memory size per processor, keeping the same effi- 
ciency of parallel processing. It affects the total sys- 
tem cost significantly. 

A disadvantage is heterogeneous inter-processor 
connections that increase the complexity of hard- 
ware implementations, however, the cluster with 
tightly coupled processors will be a standard com- 
ponent in the near future. 

(3) Large memory against processing power: 
Non-uniform computation or dynamic computation 
with wide variation of grain size require larger mem- 
ory to keep the processing efficiency, compared with 
data-parallel computation. Because extra work is 
needed to fill the idling time caused by irregular syn- 
chronization, which requires more working space in 
a memory. 

(4) Highspeed network: Highspeed network connec- 
tion between processing nodes has already become 
standard. However, the ratio of network load and 
processor load, caused by network communications, 
is different from the case of numerical processing. 
Management of virtual single name space usually 
arises extra processor loads for each communica- 
tions, compared with the case of simple data trans- 
fer in numerical processing. It causes less needs to 
network bandwidth against processing power. 

On the other hand, parallel symbolic computation 
with dynamic features often arises remote synchro- 
nizations with small data transfer. Response of 
the network communication is more important than 
bandwidth for such cases. 

(5) Coherent cache memory: Each processor in a 
cluster has coherent cache memory with write back 
strategy. Basic technology is similar to the stan- 
dard coherent cache memory used in commercial 
tightly coupled multiprocessors. However, the oc- 
currence of cache to cache data transfer, caused by 
inter-processor communications, is larger than the 
usual time sharing use of commercial multiproces- 
sors. Optimizations of cache commands and bus 
protocols for such usage is important to reduce bus 
traffic. 

(6) Dedicated processors: Processors include special 
features of tag handling, data type checking and 
branching, and dereferencing pointers for efficient 



KL1 execution. These features are useful not 

only for symbolic processing, but also for an ef- 
ficient implementation of a single-assignment lan- 
guage needed for the parallel processing of the dy- 
namic and non-uniform problems. 

The processors have dedicated instruction sets de- 
rived from the abstract instruction set KL1-B. 

Pipelining and RISC-like instruction sets are also 
used, that are standard techniques. 

4 Machine Architecture and 

Hardware 

Overall structure and features of the PIM system were 
presented in the previous section. This section shows 
the machine architecture, hardware implementations and 
some technical data of each PIM models in detail. 

4.1 Overview of Five PIM Models 

Five PIM models have been developed, that have differ- 
ent architectures or different combinations of component 
technologies, and have different rolls of R k D. 

PIM/p : PIM/p is the largest PIM model which con- 
tains maximum 512 processors. PIM/p focuses both 
architectural research and actual use in software R 
k D. 

PIM/p took the multi-cluster architecture shown in 
Figure 3. Maximum 64 clusters can be connected. 
Connection network took hypercube topology. Two 
independent networks are connected to each clus- 
ters. 

Each cluster contains eight processors connected 
with a shared bus and shared memory. A proces- 
sor has coherent cache memory, a network interface 
unit “NIU”, and an I/O device interface (SCSI bus) 
[Kumon et al. 1992]. 

Processors in all PIM models have SCSI buses, which 
are used to connect FEPs (Front End Processors) and 
hard disks. The PSI-UX [Nakashima et al. 1992] is used 
for the FEP, as an intelligent I/O device for human- 
machine interface. 

PIM/m : PIM/m targets the software development 
machine and rigid compatibility with the Multi- 
PSI/V2. 256 processors are connected with two 
dimensional mesh network. The structure is 
shown in Figure 4. 32 hard disks, which are 

20GB in total, and many FEPs are connected 
[Nakashima et al. 1992]. 
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Figure 3: Overview of PIM/p Architecture 
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Figure 4: Overview of PIM/m Architecture 



PIM/c : PIM/c also takes the multi-cluster archi- 
tecture including 256 processors in total. A 
cluster contains eight processors. 32 clusters 
are connected with a crossbar switch network 
[Nakagawa et al. 1992]. 

PIM/k : PIM/k focuses on architectural research 
within a cluster. Hierarchical cache system has been 
investigated to connect larger number of proces- 
sors in a cluster [Sakai et al. 1991]. Four processors 
share a local bus and second cache. They form a 
mini-cluster. Four mini-clusters are connected to a 
shared memory-bus and shared memory (Figure 5). 



PIM/i : PIM/i is also a research use system. LlW-type 
instruction set and cache protocol with broadcasting 
type has been investigated [Sato et al. 1992]. 



The global configuration of five PIMs are summarized 
in table 1. 

Specifications of components, that are processors, net- 
works, and cache systems, will be reported in the follow- 
ing subsections. 
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Figure 5: Overview of PIM/k Architecture 



Table 1: Global Configuration 





Topology 


Number of Clusters 


Total Number of PEs 


Memory Size/Cluster 


PIM/p 


hypercube x 2 


64 


512 


256 MB 


PIM/m 


mesh 


256 


256 


80 MB 


PIM/c 


crossbar 


32 


256 


160 MB 


PIM/k 


— 


It 


16 


1 GB 


PIM/i 


— 


2 


16 


320 MB 


Multi-PSI/V2 


mesh 


64 


64 


80 MB 



(f : four mini-clusters included) 



4.2 Processing Element 

Since KL1 implementation requires frequent runtime 
type checking, all CPUs of PIM models are designed as 
the tagged- architecture similar to the Multi-PSI. 

PIM/p, PIM/i and PIM/k have RISC-like instruction 
set whereas PIM/m and PIM/c have CISC-like micro 
programmable instruction set (Table 2). The former pro- 
cessors execute machine instructions which are at a level 
still lower than KL1-B. The latter processors interpret 
KL1-B code by horizontal micro program. 

The CPU of PIM/p [Kumon et al. 1992] has a unique 
feature called macro-call [Shinogi et al. 1988] instruc- 
tions for light-weight subroutine calls. The instructions 
enable the size of compiled user program codes to be kept 
small and to reduce the overheads of subroutine calls. It 
also has some more instructions dedicated to KL1 im- 
plementation, such as dereference instructions and MRB 
[Chikayama and Kimura 1987] incremental garbage col- 
lection instructions. The CPU takes four-stage pipeline 



structure. 

The CPU of PIM/m [Nakashima et al. 1992] is a mi- 
croprogram controlled processor with five-stage pipelin- 
ing. The instruction set is KL1-B itself, which is binary 
compatible with Multi-PSI/V2. Sophisticated data type 
checking and the automatic dereference mechanism are 
special features. 

The CPU of PIM/i tries the LIW(long instruction 
word)-type instruction set. 

4.3 Network 

Networks are summarized in table 3. 

In PIM/p, each processor has a NI and four NIs are 
connected to a router. The router works as a node in the 
network. There are two hypercube networks to attain 
large band width. 

PIM/m has a two dimensional mesh network, similar 
to the Multi-PSI. The networks of PIM/p and PIM/m 
realize so-called the worm-hole routing. 









Table 2: Specification of Processing Element 





Instruction set 


Cycle time 


LSI fabrication 


Line interval 


PIM/p 


RISC + macro instruction 


60 nsec f 


standard- cell 


0.96 gm 


PIM/m 


CISC (micro programmable) 


65 nsec 


standard-cell 


0.8 fim 


PIM/c 


CISC (micro programmable) 


50 nsec f 


gate- arrays 


0.8 n m 


PIM/k 


RISC 


100 nsec 


custom 


1.2 fim 


PIM/i 


RISC 


100 nsec f 


standard- cell 


1.2 /urn 


Multi-PSI/V2 


CISC (micro programmable) 


200 nsec 


gate- arrays 


2.0 fj,m 



(f are design specifications. They are under testing with longer cycle time.) 



Table 3: Network 





# PEs in a cluster 


# NIs in a cluster 


Transfer Rate f 


PIM/p 


8 


8 


33 MB/sec \ x2 


PIM/m 


1 


1 


8 MB/sec 


PIM/c 


8 


1 


40 MB/sec \ 


PIM/k 


16 


— 


— 


PIM/i 


8 


1 




Multi-PSI/V2 


1 


1 


10 MB/sec 



(PE = processing element, NI = network interface) 
(f: per channel, full duplex f : design specifications) 



PIM/c has one special processor named cluster con- 
troller in each cluster. The cluster controller is connected 
to a shared bus and works as a network interface to a 
crossbar network. The cluster controller has overall re- 
sponsibility for network communications. 

4.4 Cache System 

Since KLl programs arise asynchronous communica- 
tions among processors very frequently, shared bus traf- 
fic tends to become very heavy. To solve this prob- 
lem, an optimized coherent cache protocols were de- 
signed [Goto et al. 1989][Matsumoto et al. 1987], which 
can keep the locality high and reduce the shared bus traf- 
fic [Nishida et al. 1990]. All PIMs have write-back type 
coherent cache protocols (Table 4). Low cost locking 
mechanisms are also supported with utilizing the cache 
block status. 

5 KLl Language Implementa- 
tion 

KLl language has many beneficial features to write ef- 
ficient concurrent and parallel programs of the dynamic 
and non-uniform problems, which was explained in sec- 



tion 3.4. The KLl implementation is focused to realize 
the execution efficiency of those language features. This 
section looks at the language implementation methods 
and techniques briefly, that correspond to the implemen- 
tation features presented in section 3.5. The purpose of 
this section is to give a concrete image of several key fea- 
tures of the KLl implementation. Detailed information 
are presented in [Hirata et al. 1992] [Nakajima 1992], 

5.1 Execution Model of KLl 

For the help of getting the image, the execution model 
of KLl is shown briefly. KLl program is made up of a 
collection of clauses, whose form is: 

H : -Gu, | Bx,...,B n . 

guard part body part 

where H is the head, Gi the guard goal, that are collec- 
tively called the guard part. The f?,- are the body goals 
and the vertical bar ( | ) is the commitment operator. 

The guard part can be considered as a pattern match 
and condition tests. If there are alternative clauses, their 
guard parts are tested sequentially. When a clause suc- 
ceeds the pattern match and the condition tests, the 
clause commits. The caller goal is reduced to the body 
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Table 4: Specification of Cache System 





Coherence Control 


Mapping 


Cache Size 


Protocol 


# States f 


Instruction | Data 


PIM/p 


invalidation 


4 


4 way 


64 KB 


PIM/m 


— 


— 


direct-map 


5 I<B | 20 KB 


PIM/c 


invalidation 


5 


2 way 


80 KB 


PIM/k 


hierarchical 

invalidation 


4 


(1st) direct-map 


128 KB 


256 KB 


(2nd) 4 way 


1 MB 


4 MB 


PIM/i 


broadcasting 


6 


direct-map 


160 KB 


160 KB 


Multi-PSI/V2 


— 


— 


direct-map 


20 KB 



(f does not include locking state.) 



Processing Element 




Current Goal 




Suspension by Creation by 

guard unification goal rewriting 




1 (c) Cc) ) Resumption by I /p's I 

j^_^/body unification v y 



Suspended Goals Ready Goals 



Figure 6: Execution Model of KL1 



goals of the committed clause. These body goals are ex- 
ecuted concurrently (AND-parallel). A KLl clause can 
be considered as a rewrite rule, which rewrites the caller 
goal to the body goals. 

An execution model of KLl is shown in Figure 6. 
There is a goal pool which holds the ready goals to be 
rewritten. One of ready goals is taken from the goal pool 
for the execution, which is the current goal. When there 
is a clause, which matches the current goal and succeeds 
the condition tests, the current goal is rewritten. The 
rewritten goals are placed back to the goal pool. 

Goals may have common variables, that are used for 
the communication and synchronization. Let us assume 
that there are two goals sharing a logical variable. A 
body unification, produced in a goal rewriting, can in- 
stantiate the variable. Guard unifications, that appear in 
a execution of the other goal, test the instantiated value 
of the variable. This is the communication between the 
goals. When the variable is not instantiated before the 



guard unification, and no other clause can commit, the 
current goal is suspended. Instantiation of the variable 
resumes the suspended goal. This is the synchronization 
[Ueda and Chikayama 1990]. 

5.2 Supports for the Implicit Commu- 
nication 

There are several important mechanisms that realize the 
implicit communication between computation nodes. 

Let us assume that there are two goals sharing a vari- 
able in a computation node. Each goal has a reference 
to the variable. When a goal is sent to the other compu- 
tation node, a remote reference has to be generated im- 
plicitly. The implicit communication between the goals 
in the different nodes will be performed along with this 
remote reference. 

The important mechanisms are shown briefly. 

5.2.1 Global Name Space 

The implicit reference management across the computa- 
tion nodes are supported for logical variables, structured 
data and program code. It is a support of the virtual 
global name space on a distributed memory hardware. 

The export/import tables realize the feature. The 
export/import tables are the indirect reference tables 
that separate the local address space in a computation 
node and the global space for the remote references (Fig- 
ure 7). The remote reference (external reference) is iden- 
tified by the pair (A,e), where A is the node number 
in which the referenced data resides, and e is the entry 
number of the export table. Registration to the tables 
are performed dynamically when a new remote reference 
is made [Ichiyoshi et al. 1987]. 

The entry number e does not change even when a lo- 
cal garbage collection occurs which moves the location 
of the exported cell. When a duplicated exporta- 
tion/importation occurs, the same table entry num- 
ber is used (reducing a new registration to the table) 
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which eliminates useless data transfer between nodes 
[Ichiyoshi et al. 1988]. 



Export Table 



exported 
cell, X 

Node A 



Import Table 

















-<A,e> 





EX 



— 1 BEF I 


goal. 


J-REF j 


5.3 




Node B 


5.3.1 



loader program at first. Any KL1 goal hold the refer- 
ence to the corresponding code object. When a goal is 
sent to a cluster and the cluster does not contain the cor- 
responding code object, the goal execution is suspended 
and the code is dynamically transferred from the cluster 
which is pointed by the external reference held in the 



Figure 7: Export and Import Tables 



5.2.2 Implicit Data Transfer 

Data Transfer by Unifications: The implicit data 

transfer between computation nodes is initiated by uni- 
fications. 

A guard unification tries to test an instantiation of 
a logical variable. When it is an external reference 
(EX in Figure 7), a read request message, */,read(X, 
ReturnAddress) , is sent to the node A. Where X is the 
external reference (A,e), and ReturnAddress is a newly 
created export table entry in the node B. 

The goal execution, which initiated the guard unifica- 
tion, is suspended when no other clause can commit. 

When the referenced cell has a concrete value 
V, it is returned by the message, %answer_value( 
ReturnAddress, V ). The message resumes the sus- 
pended goal, which waits for the value V. If the refer- 
enced cell is not bound to a fixed value, the read request 
is suspended until the variable is instantiated. 



KLl goals can be considered as lightweight processes. 
For the efficient parallel processing, a user task have 
to include a lot of lightweight processes. It is needed 
for the parallel operating system that a group of goals 
(lightweight processes) can be handled all together as a 
task. The shoen supports the meta control facilities of 
execution control, resource management and status mon- 
itoring for the goal group. 

Shoen and Foster Parent: Any goals have to belong 

to a certain shoen. Th e foster-parent fp is a proxy shoen, 
which is created in every computation nodes where the 
goals of the shoen are executed. Each goal points their 
foster-parent in the node, and test the request for meta- 
controls in a certain interval (e.g. in every goal reduc- 
tions). Figure 8 shows the relationship among shoens , 
foster-parents and goals. 

A shoen and a foster-parent keep their environments, 
such as status, resources, and the number of goals. 
Foster-parents reduce the communication between each 
goal and their shoen , to avoid an access bottleneck at the 
shoen. 



When a body unification tries to unify a remote cell 
X with a term Y, a message %unify(X, Y) is sent to 
the referenced cluster. When Y is an atomic data or a 
structure, a simple data transfer occurs. 

The unifications between two uninstantiated variables 
in different clusters may make reference loops between 
clusters. This problem can be solved by controlling the 
direction of reference pointers [Ichiyoshi et al. 1988]. 

Lazy Transfer: When a structured data is transferred 

between nodes, one-level transfer is performed. The com- 
ponents of a structure may be atomic data or nested 
structures. The atomic data are copied and transferred 
directly, while the nested structures are remained as 
pointers and transferred as external references. This is 
called the one-level transfer. The policy is that the data 
transfer should be delayed as lazily as possible, until the 
data is really needed for some operation. 

Code Transfer: Program codes are handled as large 

structured data. They are loaded on one cluster by a 



Termination Detection: The termination detection 

of a goal group is one of the difficult subjects in parallel 
computation systems, especially when messages may be 
in transit on the network. Even if all the foster parents 
report their terminations, the shoen should not terminate 
when there are goals in transit. 

One of the solutions is the Weighted Throw Count- 
ing (WTC) scheme [Rokusawa et al. 1988], which is an 
application of the Weighted Reference Counting (WRC) 
scheme [Watson and Watson 1987]. 

5.3.2 Goal Scheduling 

The goal scheduling, discussed here, is a different concept 
with the goal group management by shoen. The goal 
scheduling is the state transition management of each 
goals, , among ready, execution, and suspension states. 
Execution priority is also managed. 

Basic Goal Scheduling Scheme: The ready goals in 

a computation node are linked into a list forming a ready- 
goal-stack. In principle, a current goal is popped from the 
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cluster 0 cluster 1 cluster 2 

shoen : shoen record G : goal 
fp : foster-parent record 

Figure 8: Relationship of shoen and foster-parents 

ready-goal-stack, then the goal rewriting is performed. 
The rewritten goals are pushed to the ready-goal-stack, 
which is the depth-first scheduling in a computation 
node. 

When any unification suspends, the goal is linked as 
a suspended goal to the variable which caused the sus- 
pension. Here, the non-busy waiting method has been 
adopted. That is, the suspended goal is not scheduled 
until the variable will be instantiated. When a suspended 
goal is resumed, it is linked to the ready-goal-stack again. 

Execution priority of goals can be specified by 
pragmas. The ready -goal-stack is managed with the pri- 
ority of goals. 



Goal Distribution within a Cluster: An automatic 

load balancing scheme is tried within a cluster. An indi- 
vidual ready-goal-stack is provided for the highest prior- 
ity goals in each processing element, to avoid conflicts of 
access to the common goal-stack [Sato et al. 1987]. The 
highest-priority goals are distributed to keep the proces- 
sor loads in good balance [Hirata et al. 1992]. 



Inter-cluster Goal Distribution: A body goal, 

goalQnode(CL), is thrown with a message ’/.throw to a 
node CL when the clause commits. The node (more pre- 
cisely, a certain processing element in the cluster CL), 
that received the '/throw message, links the goal to its 
ready-goal-stack as well as to the foster-parent. If there 
is no foster-parent, one will be created on the spot. 



5.4 Memory Management 

Memory management like dynamic memory allocation, 
reclamation, and garbage collection are indispensable for 
concurrent symbolic processing languages. 

5.4.1 Incremental Garbage Collection by MRB 

The MRB method is a subset of the reference counting 
scheme which maintains one-bit information in pointers 
indicating whether the pointed data object has multi- 
ple references to it or not [Chikayama and Kimura 1987] 
[Inamura et al. 1988]. Garbage cells that have only a 
single reference can be reclaimed incrementally. 

The MRB is also useful to optimize the updating of 
structured data. Structured data must be copied in prin- 
ciple when it is updated partially, because of the single- 
assignment feature. However, it can be rewritten de- 
structively when the structure has only a single reference, 
keeping a semantics of the single-assignment language. 

5.4.2 Garbage Collection within a Cluster 

Another garbage collection is implemented, which is per- 
formed locally within a cluster accompanied with the in- 
cremental garbage collection by MRB. Because the MRB 
scheme leaves some garbages. 

So-called stop and copy scheme is adopted basically. 
The parallel mechanism has been investigated to collect 
garbages by all processing elements in parallel in a cluster 
[Imai and Tick 1991]. 

5.4.3 Inter-Cluster Garbage Collection by WEC 

An incremental inter-cluster garbage collection scheme, 
the weighted export counting (WEC) scheme is em- 
ployed [Ichiyoshi et al. 1988]. It is an application 
of the weighted reference counting (WRC) scheme 
[Watson and Watson 1987]. The scheme has several ad- 
vantages. One is the incremental garbage collection ca- 
pability with fewer message exchanges compared with 
the full reference counting. The other is also a capabil- 
ity of reducing the messages for the case when a imported 
data has to be exported again to the different clusters. 

5.5 Abstract Instruction Set KL1-B 

KL1-B is the abstract instruction set which is common 
in PIM models. The role of KL1-B is similar to that of 
WAM [Warren 1983]. An explanation of each KL1-B in- 
struction can be found in [Kimura and Chikayama 1987]. 

Most of the KL1 implementation schemes, presented 
in previous sections, are realized as runtime routines that 
are invoked by certain KL1-B instructions implicitly. 

The KLl compiler for PIM has two phases. The first 
phase compiles a KLl program into an KL1-B code. The 
second phase translates the KL1-B code into a native 
code, making a linkage with runtime routines. 
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6 Measurements and Evalua- 
tion 

This section describes some measurements results and 
evaluations for the parallel inference machines and the 
language system. The measurements focused on a low- 
cost implementation of small-grain concurrent processes 
and remote synchronization and communication. Mea- 
surements on a few benchmark programs are also re- 
ported, including the most recent measurements on 
PIM/m. 

6.1 Measurements and Evaluation on 
the Multi-PSI/V2 

The KL1 language implementation includes so-called 
OS kernel functions, as shown in section 3.5. Most of 
the implementation features, that were presented in sec- 
tion 5, concern with the OS kernel functions. Efficient 
implementations of these functions enable the actual use 
of the beneficial features of KL1 language (presented in 
section 3.4) to write efficient parallel programs of the dy- 
namic and non-uniform problems for large-scale parallel 
machines. 

The actual execution cost of some of these functions 
have been measured on the Multi-PSI/V2. Goal schedul- 
ing cost within a computation node, communication 
cost between nodes, and communication overhead in 
benchmark programs are reported. Measurements re- 
sults shows the quite low-cost implementations. 

Note that the Multi-PSI/V2 has a mesh structure with 
64 processing elements (PEs). There are 64 computation 
nodes each of which is one PE. 

6.1.1 Goal Scheduling Cost in a Node 

Goal scheduling and synchronization cost within 
a processing element (PE) have been measured 
[Onishi et al. 1990]. 

The enqueue and dequeue cost of a simplest goal 
is 5.4 p s (27 micro-instruction steps). When a goal is 
rewritten to several goals in a goal reduction, they are 
pushed on the ready-goal-stack once (except for one goal 
which can be executed directly). The enqueue and de- 
queue cost is the summation of the pushing and popping 
cost of a goal to the ready-goal-stack. The enqueue and 
dequeue cost can be considered as a part of the process 
fork cost. 

The single-suspension cost of a simple goal is 14 
ps (70 steps). When a goal is suspended waiting for a 
variable instantiation, the goal is hooked to the variable 
cell. When the variable is instantiated, the goal becomes 
executable and is pushed on the ready-goal-stack. The 
single-suspension cost is a summation of the hook, en- 
queue, and dequeue cost. The single-suspension cost can 



be considered as the synchronization cost between pro- 
cesses in a processor. 

The two-way multiple-suspension cost of a simple 
goal is 28 ps (140 steps). A goal can wait for the vari- 
able instantiation of several different variables. The first 
instantiation resumes the goal execution. If the instan- 
tiation causes a comitment of a clause, the other wait- 
ing conditions are thrown away. The two-way multiple- 
suspension is a case of two variables. The feature is a 
combination of the indeterminacy and the synchroniza- 
tion. Cost increase from the single-suspension corre- 
sponds to the implementation cost of the indeterminacy. 

These low-cost implementations encourage the actual 
use of a lot of small-grain processes. These costs of the 
goal scheduling also give a guideline for the lower bound 
of process grain size for efficient execution within a com- 
putation node. 

6.1.2 Communication Cost Between Nodes 

Cost of the communication primitives have been mea- 
sured on the Multi-PSI/V2 

system [Nakajima and Ichiyoshi 1990]. A goal sending 
to another PE (a remote call of a lightweight process) is 
realized by ’/,throw_goal message. Inter-PE reading of 
values (used for remote synchronization and communi- 
cation) is realized by 7, re ad & # /,answer_value protocols. 

Figure 9 shows the cost of handling those three mes- 
sages at both sending and receiving PE. 

The 

cost is broken down into three parts. Encode/decode 
KL1 term, etc. is for encoding and decoding message 
packets to/from internal representations of KL1 term. It 
also includes the maintenance of the export/import ta- 
bles and the foster parent records (c.f. section 5). It is 
the essential part of the message handling. 

Basic message handling routine in Figure 9 cor- 
responds to the simple data conversion between 40-bit 
tagged words and byte-serial messages. The routine in- 
cludes data transfer to/from the hardware buffer. The 
cost can be potentially reduced by hardware supports. 
Copy_RPKB stands for copying a message packet from the 
hardware buffer to the software buffer. It is only exe- 
cuted when the hardware buffer tends to be full. 

The network transfer speed is 0.2 /rs/byte. It takes 
below 1 ps to hop one network node. It means that the 
message handling cost, just explained before, is dominant 
in the communication cost. 

Send.throw (a) shows the cost of sending a 65 byte 
’/,th.row_goal message containing a goal with three ar- 
guments. It takes 419 micro-instruction steps or 85 ps 
(cycle time = 200 ns). Receive.throw (b) shows the cost 
of receiving the same '/,throw_goal message and storing 
it in a goal stack. 

The bar graphs (c), (d), (e) and (f) describe the 
cost of sending and receiving a ’/.read message and 




Send_throw ( goal ( atom, EXREF,EXREF ) ) [65 bytes] 
la) KWXxWVj ' 



85 psec (419 steps) 



Receive_throw 

— | 

Send_read (EXREF) [14 bytes] 

(c) l^T ~1 25 psec (117 steps ) 
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Receive_read 
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Figure 9: Message Handling Cost 



Table 5: Message Frequency and Reductions 



Pentomino (39.3 KRPS on 1 PE) 



Num of PEs 


4 PEs 


16 PEs 


64 PEs 


execution time (sec) 


54.63 


14.62 


4.35 


total reductions (xlOOO) 


8,317. 


8,332. 


8,340. 


reductions/sec (KRPS) 


152.2 


570.1 


1,919.4 


reductions/msg 


221. 


108. 


88. 


msg bytes/sec (xlOOO) 


14.5 


108.1 


440.5 



Bestpath (23.4 KRPS on 1 PE) 



Num of PEs 


4 PEs 


16 PEs 


64 PEs 


execution time (sec) 


10.655 


4.062 


1.691 


total reductions (xlOOO) 


987.7 


1213.6 


1,505.2 


reductions/sec (KRPS) 


92.7 


298.8 


890.1 


reductions/msg 


21.9 


11.7 


6.2 


msg bytes/sec (xlOOO) 


114.0 


692.5 


3,854.3 



(KRPS: Kilo Reductions Per Second) 



Table 6: Single Processor Performance of PIM/m 



benchmark 


condition 


PIM/m 


Multi-PSI/v2 


\4ulti~~P S I /v2 
PIM/m 


append 


1,000 elements 


1.63 msec 


7.80 msec 


4.8 


best-path 


90,000 nodes 


142 sec 


213 sec 


1.5 


pentomino 


8x5 box 


107 sec 


240 sec 


2.2 


15-puzzle 


5,885 K nodes 


9,283 sec 


21,660 sec 


2.3 
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Figure 10: Decomposition of Processor Time and Speed-up 



Table 7: System Performance on Pentomino (8x5 box) 



No. of PEs 


PIM/m 


Multi-PSI/v2 


Multi— PS I /v2 
PIM/m 


Time 


Speedup 


Time 


Speedup 


256 PE 


1,124 ms 


HEE809 








128 PE 


1,290 ms 


83.13 








64 PE 


2,162 ms 


49.60 


4,679 ms 


51.20 


2.16 


32 PE 


3,694 ms 


29.03 


8,278 ms 


28.94 


2.24 


16 PE 


6,910 ms 


15.52 


15,686 ms 


15.27 


2.27 


1 PE 


107,238 ms 


1.00 


239,545 ms 


1.00 


2.23 



*/,answer_value message. 

Sending and receiving cost of the '/,throw_goal mes- 
sage, 215 fis (1056 steps) in total, can be considered as 
the cost of a process fork to a different PE, or a remote 
procedure call. Cost of the '/.read and */,answer_value 
messages, 182 /zs (897 steps) in total, correspond to the 



cost of the remote synchronization. 

Comparing these value with the cost of local opera- 
tions in the previous section, the remote synchronization 
takes around 10 times higher cost than local. The remote 
procedure call costs more but below 40 times of the local 
process fork. These remote/local ratio seems low enough 
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Table 8: System Performance on Pentomino (10 x 6 box) 



No. of PEs 


PIM/m 


Multi-PSI/v2 


Multi— PS 1 /v 2 
PIM/m 


Time 


Speedup 


Time 


Speedup 


256 PE 


103,655 ms 


234.29 








128 PE 


188,452 ms 


128.87 








64 PE 


359.268 ms 


67.60 


886,325 ms 




2.47 


32 PE 


694,553 ms 


34.96 


1,729,430 ms 




2.49 


16 PE 


1,367,240 ms 


17.76 








1 PE 


24,285,015 ms 


1.00 









to encourage the small-grain concurrent processing be- 
tween PEs. Measurements of the communication cost 
give a guideline for the process grain size (communication 
rate) to keep the communication overhead low. When a 
process garin size decreases, becoming close to the com- 
munication cost, communication overhead increases sig- 
nificantly (close to 50% of CPU time). 

6.1.3 Measurements on Benchmark Programs 

Benchmark Programs: The followings are the two 

benchmark programs used here. 

• Pentomino: A program to find out all solutions of a 

packing piece puzzle (Pentomino) by exploring the 
whole OR tree. Two-level dynamic load balancing 
is employed [Furuichi et al. 1990]. 

• Bestpath: A 160 x 160 grid graph is given together 

with non-negative edge costs. The program deter- 
mines the lowest cost path from a given vertex to 
all vertices of the graph by performing a distributed 
shortest path algorithm [Wada and Ichiyoshi 1990]. 
The vertices are represented by KL1 processes, and 
they exchange shortest path information along the 
edges. 25,600 small processes work cooperatively. 

Message &; Reduction Profile: Table 5 shows 

the execution time, the reduction and message rates, 
etc. [Nakajima and Ichiyoshi 1990]. Average time of one 
reduction in a PE is an inverse of the KRPS value. 25 
/is (127 steps) in Pentomino, and 43 {is (214 steps) in 
Bestpath. They are almost the grain size of concurrent 
processes in a PE. The message sending rates on 64 PEs 
are: one message per 88 reductions in Pentomino, and 
one per 6 reductions in Bestpath. 

The average network traffic was re- 
poted in [Nakajima and Ichiyoshi 1990], calculated from 
these figures. Relative to the 10 Mbyte/s network chan- 
nel bandwidth, the average traffic on a channel is very 
small: 0.08% (Pentomino) and 0.3% (Bestpath) of the 
bandwidth. 



Communication Overhead: Profiling data of pro- 

cessor execution has been measured on the two bench- 
mark programs [Nakajima 1992], The execution time is 
broken down into the four categories in Figure 10: com- 
puting time (reduction operations), message handling 
time, cache-miss penalty, and idling time. The average 
of all PEs are shown in the bar graph. The resultant 
speed-up is also shown with the ideal one. 

Two-level dynamic load distribution is used in Pen- 
tomino. Several thousands small processes are dis- 
tributed to 64 PEs in 4.35 seconds adaptively. The graph 
shows low communication overhead and good speedup. 
The degradation of processor workrate in 64-PE execu- 
tion is mainly caused by the latency of load feeding to 
PEs. 

In Bestpath, 25,600 small processes are distributed 
statically on 64 PEs. They exchange messages to per- 
form an distributed algorithm. The inter-PE commu- 
nication and the cache-miss penalty degrade the per- 
formance because of the high communication rate and 
the large working set. As the number of PEs grows, 
the grid graph is divided into smaller blocks to keep the 
workrate high, and it makes the inter- PE communication 
rate higher. Best path includs speculative computation, 
which increases with the large number of PEs. It causes 
lower speedup than a calculated value from the processor 
workrate. 

Measurements results in table 5 and Figure 10 show 
the actual communication rate and communication over- 
head. Programmers can use relatively large commu- 
nication rate, one message per 6 reductions (measured 
in Bestpath), with non-large CPU overhead of approxi- 
mately 15%. Considering a network load of 0.3% at that 
time, it is observed that CPU load (15% at that time) 
will limit the communication band width when commu- 
nication rate increases. The language implementation, 
which supports the global name space on a distributed 
memory hardware, tends to increase the CPU load con- 
cerned with network communication. 
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6.2 Preliminary Measurements on the 
PIM 

6.2.1 Single Processor Performance 

Table 6.1 shows the single processor performance of 
PIM/m for four benchmarks. The table also includes the 
performance of Multi-PSI/V2 and the ratio of PIM/m 
and Multi-PSI/V2 (M/P-speedup). 

M/P-speedup is 1.5 to 2.3 in average. Programs with 
large working set tends to show low M/P-speedup. 

6.2.2 System Performance 

Table 7,8 show the preliminary measurements of system 
performance on PIM/m. The benchmark program is 
Pentomino. 

Speedup saturation in Table 7 is caused by small prob- 
lem size. Better speedup (234 folds speedup with 256 
processors) was attained with larger problem in Table 8. 
It is also surprising that the small problem (executed 
in 1.1 second) show 95 folds speedup, which uses the 
multi-level dynamic load distribution distributing sev- 
eral thousands of small processes. The facts shows an 
efficient language implementation suitable to handle a 
lot of small-grain processes with less overhead. 

7 Conclusion 

This paper described two subjects. One is an overview 
of the research and development on the parallel inference 
machine PIM and the language implementation of the 
kernel language KL1, a concurrent logic programming 
language. 

The other is the clarification of the features and advan- 
tages of KL1 language, its parallel implementation, and 
the hardware architecture from the viewpoint that the 
features are suitable and may be indispensable for effi- 
cient parallel processing of the dynamic and non-uniform 
problems with large computation. Knowledge processing 
is included in the problem domain. These problems have 
not been covered by commercial parallel machines and 
their software systems that target the scientific compu- 
tation. The PIM system focuses on this new domain of 
parallel processing. 

PIM is a distributed memory MIMD machine with a 
global view, connecting a maximum of 512 processors. 
It includes shared-memory substructures. Many compo- 
nent technologies have been developed that support effi- 
cient parallel processing on the target problem domain, 
especially on symbolic processing. 

KL1 language also has very strong features for efficient 
programming and execution of the dynamic and non- 
uniform large problems. Major features are (1) small- 
grain concurrent processes, (2) implicit synchronization 
and communication, (3) separation of concurrency de- 
sign and mapping (load allocation and scheduling), etc. 



They support highly concurrent programming with com- 
plex structures and support large flexibility for load bal- 
ancing. The efficient language implementation made ac- 
tual use of the language features possible. The PIM and 
KLl system have realized a strong research and develop- 
ment environment for parallel software in that problem 
domain. 

Measurements and evaluations showed a very low- 
cost language implementation for handling small-grain 
concurrent processes and their remote communications. 
Good speedup by parallel processing on benchmark pro- 
grams was also reported. A lot of small-grain processes 
were handled during this processing. These results prove 
the efficiency and usefulness of the system to the dynamic 
and non-uniform problems. 

Further measurement and evaluation is continuing, 
and the results of this will be reported soon. On the 
other hand, many problems of parallel software remain 
unsolved. Continuous research must be carried out to 
construct the real technology of large-scale parallel pro- 
cessing for the dynamic and non-uniform problems in- 
cluding the knowledge information processing in the 21st 
century. The parallel inference machine PIM and the 
KLl language system will be utilized as the best research 
environment. 
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Abstract 

The Fifth Generation Computer Systems (FGCS) 
project is a national project of Japan, aiming at es- 
tablishing the basic technology required for high perfor- 
mance knowledge information processing systems. The 
parallel inference system subproject is aiming at estab- 
lishing parallel processing hardware technology for mas- 
sive processing power and software technology for effec- 
tive utilization of such hardware in the knowledge infor- 
mation processing field. The basic software system is re- 
sponsible for providing a programming language suited 
for describing knowledge information processing appli- 
cations software and providing a comfortable environ- 
ment for program execution and software development 
on highly parallel computer systems. 

A concurrent logic language with extensions to control 
program execution on parallel hardware was designed as 
the kernel language of the system. An operating sys- 
tem that provides a comfortable environment for parallel 
application software development was designed and im- 
plemented in the kernel language. This paper gives an 
overview of the research and development in this area in 
the FGCS project. 

1 Introduction 

The fifth generation computer systems project is a na- 
tional project of Japan, aiming at establishing the basic 
technology required for high-performance knowledge in- 
formation processing systems. The most important tech- 
nologies to be provided to attain the final objective of the 
project are the following two. 

• Problem solving methods for knowledge information 
processing 

• Processing power for implementation of the above 
methods 

The parallel inference system subproject is aiming at es- 
tablishing both hardware and software technologies for 
the latter. 

With the recent evolution of the hardware technology, 
multiprocessor systems are expected to be advantageous 



not only in absolute processing power but also in cost 
effectiveness early in the next century. There seems to 
be no other technology than multiprocessing to provide 
the computational power required for high-performance 
knowledge information processing systems. 

The software technology for parallel processing, on the 
other hand, is still quite premature. In particular, the 
technology for building parallel software to solve com- 
plicated problems in the area of knowledge processing 
is far from satisfactory yet. This, we think, is at least 
partly due to the problems in the approach to the par- 
allel software technology conventionally taken, that is, 
trying to augment already available sequential process- 
ing technologies. A new system of software technology 
totally redesigned for parallel processing, including algo- 
rithms, programming languages and operating systems, 
has to be established. 

As the basis of this new technology, a concurrent logic 
language with extensions to control program execution 
on parallel hardware was designed as the kernel lan- 
guage of the system. An operating system that provides 
a comfortable environment for parallel application soft- 
ware development was also designed and implemented 
in the kernel language. This paper gives an overview of 
the research and development in this area of the FGCS 
project. 

In the following sections, the design principles are de- 
scribed in section 2, the design of the kernel language 
in section 3, that of the operating system in section4. 
Experiences with the language and the operating system 
are described in section 5. Direction of future work is 
suggested in section 6, followed by concluding remarks. 

2 Principles 

2.1 Middle- Out Approach 

When designing a computer system, two extreme ap- 
proaches can be considered. One is a top-down ap- 
proach, starting from problems to solve, gradually de- 
signing downwards to the level of computer architecture 
or even to the level of electronic devices, seeking in each 
level for a design most appropriate to implement higher 
levels. The other is a bottom-up approach, starting from 
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available device technologies, seeking for the best use of 
the lower level technology, finally finding an appropriate 
application area. 

Neither of the approaches, however, cannot be success- 
ful by itself. In the top-down approach, design in each 
level requires insight into appropriate implementation of 
all the lower level technologies. In the bottom-up ap- 
proach, design in each level requires insight into upper 
levels, up to application areas appropriate for the chosen 
design. 

It is too difficult for anybody to have such insight for 
the broad and rather vague target of a long-term project, 
knowledge information processing. We thus decided to 
take a middle-out approach of designing a certain inter- 
mediate level first and conduct research and development 
towards two directions, upwards and downwards, simul- 
taneously. It is not easy, of course, to find an appropriate 
intermediate level and to actually design that level. This, 
however, seemed to be the only feasible approach for a 
project like this one. 

2.2 Kernel Language 

The intermediate level we chose was the level of program- 
ming languages. Choosing this level has the following 
merits. 

• The programming language level is not too far away 
from the both extreme ends of application software 
and hardware implementation. 

• Relatively rigorous specification in the programming 
language level can be given more easily than in other 
levels. 

The programming language designed to be the starting 
point of this middle-out approach is called the kernel 
language [Ueda and Chikayama 1990]. 

At the time the project started in 1982, language de- 
sign and implementation technology was still premature 
to fix the design of the kernel language. Thus, the re- 
search started by investigating sequential systems first. 
In the first stage (fiscal years of 1982-84) of the project, a 
sequential kernel language based on Prolog, named ESP 
[Chikayama 1984], was designed, which formed the basis 
of the research and development in most of the research 
efforts in the first stage and early in the intermediate 
stage. 

Design of the next version of the kernel language KL1 
was started in the first stage simultaneously. Its pre- 
liminary design and implementation were done early in 
the intermediate stage and a fuller implementation on 
a experimental parallel computer system was completed 
within the intermediate stage (1985-88). The language 
has been used through the final stage (1989-) for var- 
ious application research. In what follows, the kernel 
language means this second generation kernel language, 
KL1. 



2.3 Logic Programming Principle 

The logic programming idea gave the basis of the whole 
project. The image of logic programming in the original 
project plan seems to have been strongly , influenced by 
a particular language Prolog. As the research proceeded 
from sequential systems to parallel systems, we had cho- 
sen a concurrent logic programming approach. The prin- 
ciple of placing “logic” as the central design principle, 
however, has been kept unchanged. 

The principle of logic programming played a impor- 
tant role in selecting a particular design among many 
candidates. In designing the kernel language, its sound- 
ness in the sense of mathematical logic has been acted as 
a “canon”, although we gave up pursuing completeness. 1 
Many proposals to extend the kernel language with at- 
tractive features were investigated but rejected because 
of their unsoundness. On the other hand, features which 
do not change the meaning of the programs when inter- 
preted as logical formulas were more freely added to the 
language. They have only to do with execution efficiency 
and nothing to do with the correctness of programs, and 
were clearly discriminated from the core part of the lan- 
guage. 

These principles based on logical interpretation of pro- 
grams have been quite helpful in keeping the language 
design coherent and, in its consequence, its implementa- 
tion and its programming style coherent, as is described 
further in detail below. 

2.4 Target Architecture 

A processor with performance comparable to a full-size 
computer with reasonable amount of memory is now 
available on a single circuit board. Recent evolution 
of the hardware technology shows four-times increase in 
density of circuitry every three years. Extrapolating this, 
one hundred processors with reasonable amount of mem- 
ory are expected to reside in one chip early in the next 
century. On the other hand, although the performance 
of single processor is steadily being improved, it might 
be very difficult to attain improvement by two orders of 
magnitude within the same time period. 

With larger circuitry made practical with higher den- 
sity, the design cost is beginning to dominate the total 
cost of processors. The design repeatability in multi- 
processor systems will have great cost advantage over a 
complicated processor occupying one whole chip or more, 
even if the both systems had the same performance. 
Early in the next century, multiprocessor systems will 
thus be advantageous, not only in absolute processing 
power, but also in cost effectiveness even in small sys- 
tems such as palm-top or wrist watch type computers. 

1 Soundness of a system means that any results obtained are 
logical consequences of the given axiom set. Completeness, on the 
other hand, means that all logical consequences can be obtained. 
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For application areas such as knowledge information 
processing that need non-uniform computation, an ar- 
chitecture that allows flexible resource allocation is re- 
quired. For highly parallel systems, scalability of the 
system architecture is critical. Having these in mind, we 
chose a homogeneous MIMD architecture with loosely- 
coupled processors (or loosely-coupled clusters each with 
several tightly coupled processors) as the target architec- 
ture of the software system. 

2.5 Level of the Kernel Language 

An ideal programming language should allow very high 
level description with an implementation optimizing it to 
the target architecture without any human help. How- 
ever, with the current technology, such a language is 
nothing more than a dream. It is especially so when the 
programs have to be optimized for execution on a large- 
scale loosely-coupled parallel computer systems where 
communication delay is not negligible. The most dif- 
ficult part in the optimization will be where (on which 
processor) to execute certain parts of computation and 
when (in which order). Such a problem is known as the 
mapping problem. 

As long as problem solving techniques used are rel- 
atively simple, required computation can be easily told 
beforehand making static mapping by compilers feasible. 
For knowledge information processing requiring sophis- 
ticated problem solving methods, what to compute next 
often depends on the result of the former steps of the 
computation, making static optimization of computation 
mapping impossible. Many research results have shown 
that general-purpose automatic mapping algorithm is 
hard to design and the selection of good mapping algo- 
rithms depends heavily on the problem solving method 
used. 

As knowledge information processing is an area where 
no single universal and efficient problem solving method 
is known, providing one single mapping algorithm is not 
appropriate. Providing many mapping algorithms that 
cover all the known methods may still be insufficient; as 
research in the area is still in an early stage, many novel 
problem solving methods are expected to be proposed in 
the near future. Thus, we set the level of the kernel lan- 
guage so that mapping of computation can be specified 
in programs. 

This decision of putting the responsibility of computa- 
tion mapping on programmers has the drawback of mak- 
ing programming a more complicated task. We, however, 
regard this additional effort as unavoidable and essen- 
tial in establishing the technology for high performance 
knowledge information systems. When a widely applica- 
ble mapping algorithm is established, it can be provided 
to the application users as a program library. With the 
kernel language capable of controlling program execu- 
tion, writing such a library should not be difficult. 



2.6 Designing a New Language 

It might have been possible to take an already existing 
logic programming language as the basis of the kernel 
language and extend it with several additional features 
for concurrent execution. The logic programming lan- 
guage used most widely was (and still is) Prolog, which 
was the primary candidate for such extensions. 

There could be two ways to tailor Prolog to a language 
for parallel systems. One method was to provide implicit 
and automatic computation mapping, which was not 
taken by the above-described reason. Another possible 
way was to make concurrent execution explicitly spec- 
ified with additional language constructs. However, as 
the base language Prolog was designed for sequential pro- 
cessing, concurrency specification would add some more 
complexity to the language and making programs harder 
to understand. More importantly, if sequential execu- 
tion should have made the default principle, it would 
have been more difficult to reorganize programs for bet- 
ter mapping, as different mappings require different parts 
of programs to run concurrently. 

Another problem with such a language was pains in 
specifying synchronization. In programming languages 
in which synchronization is specified independent from 
conditioning, problems arise when decisions on condi- 
tional execution are made on incomplete data. On phys- 
ically parallel hardware, finding such problems would be- 
come very painful because the same phenomenon is often 
hard to reproduce. To solve this problem, synchroniza- 
tion and conditioning should not be made separate. 

We decided that the kernel language should be de- 
signed from scratch so that concurrent execution could 
be expressed in a natural way. The language should have 
intrinsic concurrency: language constructs imply concur- 
rent execution in principle and sequencing is explicitly 
described. Synchronization should be integrated with 
conditioning in the language construct. 

2.7 Designing a New OperatingSystem 

Even though the prototype parallel inference system is 
an experimental system, an operating system that pro- 
vides a comfortable software development environment 
was mandatory. One way to provide the required func- 
tionality might have been to port an already existing 
operating system to the parallel inference machine. 

All the operating systems available then (and probably 
most of them even now) were designed originally for se- 
quential systems and augmented afterwards with certain 
primitives for execution on parallel systems. 

There were two major problems with such systems. 
One was that the interface of the operating system with 
the user programs was still based on sequencing. For 
example, the user program is notified of completion of 
requested service by the completion of execution of a pro- 
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cedure, supervisor call, in the user’s thread of execution. 
This is acceptable in systems where application software 
is written in basically sequential languages. This, how- 
ever, would not go well with software written in the ker- 
nel language with intrinsic concurrency. 

Another problem was that the management policies 
of such operating systems were highly optimized for se- 
quential processing. In sequential systems or small-scale 
parallel systems, centralization of all the management 
information is usually the most robust and efficient pol- 
icy. This, however, is far from optimal for highly parallel 
systems. If the management were centralized on one pro- 
cessor in a highly parallel system, that processor would 
be responsible for too much management work and would 
be the bottleneck of the whole system. Moreover, every 
activity within the system would require communication 
to and from that processor, resulting in communication 
bottleneck. 

We concluded that designing an operating system op- 
timized for highly parallel systems was also an unavoid- 
able and essential part of the technology for high per- 
formance knowledge information systems and decided 
to design and implement a new operating system from 
scratch. The user interface should be consistent with the 
design of the kernel language; sequencing should not be a 
part of the design of the interface. Distribution of man- 
agement was essential to avoid bottlenecks, which might 
also affect the specification of the services provided by 
the operating system. 

3 Kernel Language: KL1 2 

The kernel language KL1 has two layers. The basic layer 
is defined by Guarded Horn Clauses (GHC), which is a 
concurrent logic language for describing what computa- 
tion to perform for desired result, that is, for describing 
correct programs. The description lays only those con- 
straints on mapping of computation which are required 
to obtain the desired result. Based upon this layer is the 
full KL1 language for describing how such computation 
should actually be carried out with desired mapping of 
computation, that is, for describing efficient programs. 
This separation of correctness and efficiency issues or, in 
other words, concurrency and parallelism, seems to play 
an important role in bridging the gap between parallel 
inference systems and knowledge information processing 
in a coherent manner. 

3.1 Concurrent Logic Language GHC 

This section describes the design of a concurrent logic 
language Guarded Horn Clauses, which forms the basis 

2 This section is a rewrite of an article co-authored with 
Kazunori Ueda [Ueda and Chikayama 1990], except for the sub- 
section 3.3. 




Figure I: Two Layers of the Kernel Language 



of the kernel language KL1. 

3.1.1 Concurrent Logic Languages 

The design effort of the kernel language was started in 
1982 with the start of the project by seeking for an ap- 
propriate framework of the language. As the concur- 
rent logic programming framework seemed to provide the 
characteristics in our need, we investigated many lan- 
guages in the family as the basis of the kernel language, 
including Relational Language [Clark and Gregory 1981], 
Concurrent Prolog [Shapiro 1983] and PARLOG [Clark 
and Gregory 1983]. This study led us to a design of a 
new concurrent logic language, Guarded Horn Clauses 
(GHC) at the end of 1984 [Ueda 1986]. 

GHC shares its basic framework with other concur- 
rent logic languages. Firstly, a GHC program is a set of 
guarded clauses. Secondly, GHC features no don’t-know 
nondeterminism (built-in search capability) but features 
don’t-care nondeterminism, which allows description of 
reactive systems. Reactive systems in concurrent logic 
languages are based on the process interpretation of logic 
[van Emden and de Lucena Filho 1982], in which a goal 
(or a multiset of subgoals derived from it) is regarded 
as a process and processes communicate by generating 
and observing bindings (between shared logical variables 
and their values). Like most concurrent logic languages, 
all bindings are determinate in GHC, that is, they are 
never revoked once published to other processes. The 
determinacy of bindings is essential in reactive systems, 
such as an operating system, because the bindings may 
be used for interacting with the real outside world. The 
lack of built-in search capability also allows programs to 
specify the way of their execution in more detail, which 
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dso matches our principle of making programs specify 
napping of computation. 

5.1.2 Guarded Horn Clauses 

What then is the relative merit of GHC over other con- 
current logic languages? In our study of various concur- 
rent logic languages, we focused on Concurrent Prolog, 
which was the most expressive of them, and built its 
prototype implementation [Miyazaki et al. 1985]. The 
experience led us to clarify the definition of atomic op- 
erations of the language, which in turn led us to a new 
language with simpler atomic operations. 

As explained above, one important aspect of concur- 
rent logic languages is the determinacy of bindings. In 
general, the execution of a concurrent logic program pro- 
ceeds using parallel input resolution [Ueda 1988a] that 
allows parallel execution of different goals, but under the 
following rules to guarantee the determinacy of bindings: 

(1) The guards (including the heads) of different clauses 
called by a goal g can be executed concurrently, but 
they cannot instantiate g. 

(2) The goal g commits to one of the clauses whose 
guards have succeeded. 

(3) The body of a clause to which g has committed can 
instantiate g. The bodies of clauses to which g has 
not committed cannot instantiate g or the guards of 
the clauses. 

(4) A goal is said to succeed if it commits to some clause 
and all its body goals succeed. 

That is, before commitment, a goal can pursue two 
or more clauses but without generating bindings. Af- 
ter commitment, it can generate bindings but only one 
clause is left. 

Another important aspect of concurrent logic lan- 
guages is how synchronization is achieved. In general, 
synchronization is achieved by restricting information 
flow caused by unification. Concurrent Prolog uses read- 
only annotations, and PARLOG uses mode declarations 
which are used for compiling the unification of input ar- 
guments into a sequence of one-way unification and test 
unification primitives. However, in these languages, ad- 
ditional mechanisms are necessary to guarantee restric- 
tion (1) above. 

The key idea of GHC is quite simple. It uses the re- 
striction (1) itself as a synchronization construct. That 
is, any piece of unification which is invoked directly or 
indirectly from the guard of a clause C and which would 
instantiate the caller of C is suspended until it can be ex- 
ecuted without instantiating the caller. In other words, 
GHC has integrated two notions: the determinacy of 
bindings and synchronization. 



A kernel language must provide a common framework 
for people working on various aspects of the project in- 
cluding applications, implementation, and theory. Be- 
fore accepting GHC as the basis of our kernel language, 
we had to convince ourselves that it satisfies the follow- 
ing conditions: 

• It is expressive enough. 

• It can eventually be implemented efficiently, possi- 
bly by appropriate subsetting. 

• It is simple enough to be understood and used by 
programmers. Also, it is simple enough for theoret- 
ical treatment. 

We soon made sure that GHC was expressive enough 
to write most concurrent algorithms that had been writ- 
ten in other concurrent logic languages, but that was 
not enough. How to program search problems was also 
important, because search problems are a specialty of or- 
dinary logic languages. So we have developed a couple of 
methods for programming search problems [Ueda 1987], 
[Tamaki 1987], [Okumura and Matsumoto 1987]. 

For implementability, we quickly ascertained by rapid 
prototyping that GHC can be implemented fairly ef- 
ficiently at least on sequential computers [Ueda and 
Chikayama 1985]. 

3.1.3 Flat GHC 

For simplicity, we continued to study the properties of 
GHC and looked for a simpler explanation of the lan- 
guage better suited to process interpretation. Now, our 
interpretation is that a GHC process is an abstract entity 
which observes and generates information (represented 
in the form of bindings) and which is implemented by a 
multiset of body goals. The behavior of each body goal 
is defined by guarded clauses that can be regarded as 
rewrite rules. 

A problem with the original definition of GHC is that 
guard goals do not fit well into this process interpreta- 
tion. We also felt, from a practical point of view, that 
the expressive power of guard goals did not justify the 
implementation effort even if it could be implemented 
efficiently. . 

These considerations led us to reduce GHC to a sub- 
set, Flat GHC. Guard goals of Flat GHC are auxiliary 
conditions to be satisfied for applying the clause. The 
sufficient conditions to be satisfied by a guard goal as 
an auxiliary condition are that it is deterministic (that 
is, whether it succeeds or not depends only on its argu- 
ments) and that it does not produce any bindings. This 
restriction simplified the theoretical treatment consider- 
ably in the operational semantics [Ueda 1990] and pro- 
gram transformation rules [Ueda and Furukawa 1988]. 

To summarize, a Flat GHC program is a set of guarded 
clauses that can be regarded as rewrite rules of goals. 
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The guard of a clause specifies what information should 
be observed before applying the rewrite rule, and the 
body specifies the multiset of goals replacing the original. 
A body goal is either a unification goal of the form t\ = ^ 2 , 
whose behavior is language-defined, or a non-unification 
goal, whose behavior is user-defined. A unification body 
goal generates information by unifying t-y and 1 2 , and a 
non-unification body goal represents the rest of the work 
and will be reduced further. 



3.1.4 Characteristics of GHC 

The semantics of Flat GHC can be understood both alge- 
braically and logically. The algebraic one is the process 
interpretation mentioned above. A logical characteriza- 
tion of communication and synchronization was given 
by Maher [Maher 1987], showing that information com- 
municated by processes can be viewed as equality con- 
straints over terms. 

Unlike Concurrent Prolog but like PARLOG, the pub- 
lication of bindings is not done atomically upon com- 
mitment of a non-unification goal but eventually after 
commitment using a unification body goal that can run 
in parallel with other goals. This means that commit- 
ment in GHC is a smaller and simpler operation than in 
Concurrent Prolog. Moreover, in GHC, the information 
generated by a unification body goal is not an atomic 
entity but can be transmitted in smaller pieces, possi- 
bly with communication delay. We have found that this 
liberal computational model of (Flat) GHC is expressive 
enough to program cooperating concurrent processes and 
leaves more freedom to implementation. 

Another point to note is that GHC has included con- 
trol for the correct behavior of processes but excluded 
any control for efficient execution. GHC has left the 
latter to KLl described below, in order to clearly dis- 
tinguish between the two notions. This contrasts with 
PARLOG, which features sequential AND that can be 
used for suppressing parallel execution of body goals. We 
believe that it is important to learn that synchronization 
based on information flow is sufficient for writing correct 
concurrent programs. 

Important topics on theoretical aspects of Flat GHC 
include the relationship with other theoretical models of 
concurrency such as CCS [Milner 1989] and theoretical 
CSP [Hoare 1985]. Although concurrent logic languages 
differ from CCS and CSP in their asynchronous commu- 
nication and dynamically reconfigurable processes, sim- 
ilar mathematical techniques can be used to formalize 
them. We have not yet obtained a completely satisfac- 
tory formal semantics, but we are fairly confident that 
Flat GHC is theoretically simple enough, while it can be 
used for practical programming without any modifica- 
tion. 



3.2 Practical Parallel Language KLl 

As described above, we have designed a concurrent logic 
language Flat GHC as the basis of the kernel language. 
The descriptive power of the language, however, is not 
sufficient when efficient program execution is our con- 
cern, which was the original motivation of parallel com- 
puters. 

As Flat GHC programs do not say anything about 
where (i.e. , on which processor) the atomic operations 
making up a computation should be performed, there 
are many ways to distribute the operations over avail- 
able processors. As Flat GHC programs only specify the 
partial ordering of atomic operations, there are many 
possible total orderings conforming to it. To make sure 
that the distribution and the ordering employed are not 
far from optimal, we must be able to specify physical 
details of execution to some extent. 

We thus designed a parallel programming language 
based on the concurrent programming language Flat 
GHC, in which we can specify in certain detail how a 
program should be executed. This section describes the 
outline of this language, named KLl. 

3.2.1 Mapping of Computation 

Flat GHC programs implicitly express any potential par- 
allelism in the sense that no ordering between atomic op- 
erations exists except for those essential for correctness. 
On real-world computer systems with a limited number 
of processors and non-negligible cost of interprocessor 
communication, faithful exploitation of this parallelism 
will almost never show optimal efficiency. To achieve rea- 
sonable efficiency, control is required on when and where 
each atomic operation should be performed. This control 
is called mapping. 

Mapping is often implicit in sequential systems. With 
two possible methods to solve a problem, a good strategy 
on a sequential system would be trying more efficient but 
less reliable one first and trying less efficient but reliable 
one second only when the first one fails. This may not 
be the best for parallel systems, when the first method 
will not require all the computational resource (such as 
processors) for its execution. In such a case, the second 
method should be tried in parallel with the first. This 
computation may or may not be required depending on 
the result of the first method. Such computation is called 
speculative [Burton 1985]. For efficiency, computation by 
the second method should not interfere the execution of 
the first by snatching required resources. This is effected 
by giving priority to the first method over the second. 
From this viewpoint, the original sequential algorithm 
uses sequencing of two methods not for correctness but 
for efficiency to implicitly specify priority. 

Sometimes more sophisticated mapping is desirable. 
Suppose that there are two methods to solve a problem 
and that, although at least one is known to find a so- 
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Figure 2: Shoen Construct 



lution efficiently, we cannot tell which beforehand. In 
such a case, the best scheduling strategy may be to give 
both methods approximately the same amount of com- 
putational resource. Resource management is thus an 
important part of an algorithm in parallel computation. 

In sequential computer systems and in parallel com- 
puter systems as extensions of conventional sequential 
systems, operating systems are primarily responsible for 
mapping. This is acceptable as far as application pro- 
grams are mostly sequential and the mapping strategy is 
often specified by sequencing implicitly. In parallel sys- 
tems where explicit mapping operations are much more 
frequently required, requesting each mapping operation 
to the operating system would incur intolerable over- 
head. 

3.2.2 Mapping Features of KLl 

To solve this problem, we have introduced into KLl the 
following features, which are intended to be efficiently 
implemented: 

Shoen: Shoen 3 represents a group of goals. This group 
is used as the unit of execution control, namely the 
initiation, the interruption, the resumption and the 
abortion of execution. Exception handling and re- 
source consumption control mechanism are also pro- 
vided through this shoen construct. It has two com- 
munication streams as its interface: one directs from 
outside of the shoen, called control stream, for sending 
messages to control execution in the shoen-, the other, 
called report stream, has the reverse direction for re- 
porting events internal to shoen. The shoen construct 
is an extension of the metacall construct proposed by 
Clark and Gregory [Clark and Gregory 1984]. 

Priority: A (body) goal of a KLl program is the unit of 
priority control. Each goal has an integer priority as- 
sociated with it. Each shoen keeps the maximum and 
the minimum priorities allowed for goals belonging to 

3 Shoen is a Japanese word corresponding to ‘manor’ in English. 



it, and the priority of each goal is specified relative to 
these. The language provides a large number of log- 
ical priority levels, which are translated to physically 
available priority levels provided by each implementa- 
tion. 

Processor specification: Each (body) goal may have 
a processor specification, which designates the proces- 
sor (or a group of processors) on which to execute the 
goal. 

This straightforward mechanism provides the basis 
of research in more sophisticated computation map- 
ping strategies. Actually, several automatic mapping 
strategies have been developed for diverse problems, 
and relatively universal ones are provided as libraries 
[Furuichi et al. 1990]. 

One of the most notable characteristics of the KLl lan- 
guage is that these priority and processor specifications 
are separated from concurrency control. We call these 
specifications pragmas. Pragmas are merely guidelines 
for language implementations and may not be precisely 
obeyed. The same is true of the controlling mechanism 
of shoen; abortion of computation, for example, may not 
happen immediately. This relaxation makes distributed 
implementation much easier. 

In many parallel programming languages, the specifi- 
cation of parallel execution is often mixed up with other 
language constructs, especially with constructs for con- 
currency control. A major revision is often required for 
revising only the mapping of computation to improve 
efficiency, which is liable to introduce new bugs. 

Although pragmas are specified within the program 
in KLl, they are clearly distinguished syntactically from 
other language constructs. Pragmas will never change 
the correctness of the programs, 4 though the perfor- 
mance may change drastically. As it is not uncommon 
that more than half of the effort to develop a program is 
devoted to the design of appropriate mapping, it is most 
advantageous that mapping specifications can be altered 
without affecting correctness of the program. 

3.2.3 Keeping up with Sequential Languages 

What criterion is appropriate for comparing parallel al- 
gorithms? Assume that a parallel algorithm has sequen- 
tial execution time c(n) (n being the size of the prob- 
lem) and average potential parallelism p(n). Then the 
total execution time by this algorithm on an ideal par- 
allel computer is given by c(n)/p(n). This means that 
an algorithm with more sequential execution time but 
with still more parallelism is considered to be a better 
algorithm on an ideal parallel computer. 

4 To be precise, the priority specification may be used for guar- 
anteeing certain properties of diverging (i.e., autonomously non- 
terminating) programs. 
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This, however, does not hold when the potential par- 
allelism, which may vary over time, can exceed the phys- 
ically available parallelism. As physical parallelism is al- 
ways limited in the real world, a parallel algorithm with 
sequential time complexity worse than a sequential al- 
gorithm will be beaten by that sequential algorithm for 
sufficiently large n, no matter what p{n) is. To sum- 
marize, parallel languages must be able to express any 
algorithms with the same sequential time complexity as 
in sequential languages to be really useful. 

Pure languages such as pure Lisp and pure Prolog can- 
not express certain kinds of efficient algorithm due to 
the lack of the notion of destructive assignment. GHC 
also is a pure language with the same inherent problem. 
To write efficient algorithms in these pure languages, we 
must be able to somehow mimic the efficiency of array 
operations in conventional languages. 

For this reason, KLl introduced a primitive for updat- 
ing an array element in constant time without disturbing 
the single-assignment property of logical variables. The 
primitive can be used as follows: 

set _vector_element (Vect , Index, 

Elem, NewElem, NewVect) 

When an array Vect, an index value Index and a new el- 
ement value NewElem are given, the predicate binds Elem 
to the value of the Index’th element of Vect, and New- 
Vect to a new array which is the same as Vect except 
that the Index’th element is replaced by NewElem. 

Because some other goals may still have references to 
the old array Vect, a naive implementation might allo- 
cate a completely new array for NewVect and copy all but 
one elements. However, when it is known that no goals 
other than the above set_vector_element goal have ref- 
erences to Vect, there will be no problem in destructively 
updating it. In the actual implementation of KLl, a sim- 
plified, efficient version of the reference counting scheme 
[Chikayama and Kimura 1987] detects such a situation, 
in which event the new array NewVect is obtained in con- 
stant time. 

This means that any imperative sequential algorithm 
can be rewritten in KLl retaining the same computa- 
tional complexity, as random access memory can always 
be emulated using a single-reference array. Of course, al- 
lowing only one reference to a data structure can decrease 
the possibility of parallel execution considerably. How- 
ever, this requirement of the computational complexity 
becomes essential only after physically available paral- 
lelism is used up. 

3.3 Higher-Level Languages 

Although the kernel language KLl allows relatively 
higher level description of programs than imperative lan- 
guages, its description level is in the same level as Lisp, 
which is still too low for certain application programs 



in the area of knowledge information processing. This 
section describes research on providing higher-level lan- 
guage constructs upon KLl. 

3.3.1 Macro Expansion 

A powerful macro expansion mechanism similar to the 
one available in ESP [Kondoh and Chikayama 1988] is 
designed and implemented. This macro allows not only 
in-place expansions of macro invocations but also inser- 
tion of terms into the program in the levels of arguments, 
goals or clauses. The following are possible using these 
features. 

• Simple in-place expansion 

• Conditional compilation 

• Functional notations including but not restricted to 
arithmetical expressions 

• Implicit arguments 

A goal of Flat GHC programs has very short lifetime, 
as it consists of only one reduction to its subgoals. To 
realize a process with longer lifetime, a programming 
style is used in which a goal recursively calls the same 
predicate with almost the same arguments. This pro- 
gramming style is used almost everywhere in the oper- 
ating system and application programs. In such a pro- 
gramming style, the state of the process or any paths 
to communicate with other processes (shared variables) 
have to be passed as the arguments of the recursive goal. 
This ensures higher modularity, but always describing 
such arguments is too verbose, making it harder to un- 
derstand or to revise programs. The implicit argument 
passing mechanism can be conveniently used to describe 
processes in a more concise manner. 

The macro expansion mechanism of KLl is so pow- 
erful that functions beyond mere syntactic sugaring can 
be provided using its features. However, programmers 
can freely choose any programming style allowed in KLl. 
Although this is advantageous in certain cases, restric- 
tion on the usage of the language features is profitable 
in making programs easier to understand and maintain. 
We thus started designs of higher-level languages to be 
compiled into KLl, which will be described in the fol- 
lowing sections. 

3.3.2 A’UM 

The programming style of KLl most frequently used is to 
describe a set of processes communicating through mes- 
sage streams [Shapiro and Takeuchi 1983]. Streams are 
realized by gradually instantiating a list structure con- 
sisting of binary cells. Processes are realized using tail 
recursion. A’UM is a programming language designed 
to describe such programs more directly than explicitly 
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writing such realization of message streams and processes 
[Yoshida and Chikayama 1990]. 

A prototype implementation of the language was a 
translator to KL1. As a thoroughly object-oriented lan- 
guage, every entity of the language A’UM, an integer 
value for example, appears as a process. We could find 
no other way than to actually implement them as pro- 
cesses in KL1. The choice then was whether to aban- 
don thorough object-orientationor to implement it dif- 
ferently, not as a part of the parallel inference system. 
A’UM took the latter choice and research on its more 
direct implementation is ongoing [Konishi et al. 1992]. 
A prototype implementation is already operational on a 
system of network- connected workstations. The former 
approach was taken by another language with similar 
objectives, called AYA, which is described in the next 
section. 

3.3.3 AYA 

The design of the language AYA was initiated after we 
decided to let A’UM seek for pure object-orientation 
rather than pursue practical efficiency on the parallel 
inference system [Susaki and Chikayama 1991]. 

The design objective of AYA is the same as the initial 
motivation to design A’UM, namely, providing a more 
concise way to describe programs in object-oriented pro- 
gramming style of KL1. In design of AYA, a higher prior- 
ity is given to practical efficiency and freedom of descrip- 
tion than uniformity as an object-oriented languages. 
Not all entities are “objects”: integers will not respond 
to “add” messages. Its design was mostly bottom-up; 
most of the language features were chosen based on our 
programming experiences in KL1. 

Processes of AYA can have multiple streams to receive 
messages, making it impossible to interpret one single 
message stream to be representing an object. Commu- 
nication patterns besides streams such as asynchronous 
interrupts are also allwoed. 

A characteristic feature of AYA is the notion of scenes, 
corresponding to the macroscopic context of a process. 
A process can have many scenes to act in and its reaction 
to messages from outside will depend on in which scene 
it is currently acting. 

Implementation effort of AYA is ongoing and a proto- 
type translator to KL1 is already operational. 

4 Operating System: PIMOS 

As described above, an operating system tuned to con- 
trol highly parallel programs effectively is vital for fully 
exploiting the power of highly parallel computer sys- 
tems. The system should also be user-friendly and robust 
enough for practical and extensive use in parallel soft- 
ware research. The Parallel Inference Machine Operat- 
ing System (PIMOS) was designed to fulfill the require- 



ments and implemented in the kernel language. This 
section describes the overall design of PIMOS. 

4.1 Prior Works 

The possibility and advantages of writing a complete op- 
erating system in a concurrent logic language were sug- 
gested by Shapiro [Shapiro 1986]. Based on this principle 
but with much improvements in various aspects, several 
experimental systems such as the Logix system [Hircsh 
et al. 1987] and the Parlog Programming System (PPS) 
[Foster 1987] were implemented. 

PIMOS resembles PPS in many aspects. This resem- 
blance is partly due to the resemblance of the implemen- 
tation languages (KL1 and PARLOG) and partly due to 
frequent exchange of ideas among the two groups. 

A notable difference between PIMOS and the other 
above-mentioned systems lies in the underlying language 
implementations and the way the system is used. PI- 
MOS is designed to be efficiently executed on a parallel 
hardware to be practically used in the research and de- 
velopment of application software, while other systems 
are built as experimental systems upon commercially 
available systems. In other words, PIMOS shares with 
other systems the objective of seeking for a novel method 
of constructing an operating system in concurrent logic 
language, but has an additional objective of providing 
a comfortable and efficient environment for application 
software development. This considerably affected vari- 
ous design trade-offs. 

4.2 Objectives 

In designing PIMOS, the following items were set as the 
design objectives. 

Robustness: As PIMOS is to be used on a stand-alone 
parallel computer system, the robustness of the system 
is more important than in systems build upon another 
established system. 

Internal Parallelism: The ultimate objective of PI- 
MOS is, as stated above, to provide features for fully 
exploiting the power of parallel inference hardware. 
Various computation required in such an operating 
system should also be executed in parallel. Other- 
wise, the operating system will be the bottleneck of 
the whole system. 

High Locality: The target architecture has loosely- 
coupled processors where inter-processor communica- 
tion is much more costly compared with communica- 
tion within one processor. Thus, the amount of com- 
munication between processors should be kept as low 
as possible. 

Flexibility: As the hardware parameters are expected 
to change, the system should have enough flexibility 
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to be tuned to the given parameters. When tuning by 
changing parameters of the operating system becomes 
insufficient, non-trivial re-design of the system may be 
required. Thus, a system on whose improvement is 
easy is desirable. 

4.3 Resource Management 

Management of resources is the most fundamental and 
important role of an operating system. This section de- 
scribes the design of the resource management mecha- 
nism of PIMOS. 5 

4.3.1 What Resources to Manage 

In conventional systems, memory management and pro- 
cess management are the most important tasks of oper- 
ating systems. As in other high-level language for sym- 
bolic manipulation, KL1 provides an automatic memory 
management feature including garbage collection. Thus, 
basic memory management is by the language implemen- 
tation rather than PIMOS. As KL1 provides implicit con- 
currency and data-flow synchronization, context switch- 
ing and scheduling are already supported by the lan- 
guage. Thus, PIMOS does not have to manage low-level 
fine-grained processes, but controls larger-grained groups 
of processes using the shoen feature of the kernel lan- 
guage. 

On the other hand, PIMOS has full responsibility on 
the management of resources such as input and output 
devices. In the lowest level, I/O devices are provided 
as primitives of the kernel language to control physi- 
cal device interface. Thanks to the descriptive power 
of the kernel language for reactive systems, such devices 
have a disguise of an ordinary process in the kernel lan- 
guage level. Their functionality, however, is at a level too 
low for application programs. Like any other operating 
systems, PIMOS virtualizes such devices, allowing ap- 
plication programs to control virtual devices with much 
higher-level functionality. 

These virtual devices are actually a process that con- 
verts higher-level requests from user tasks into lower- 
level requests that physical devices can understand. The 
user tasks send their request messages to a stream con- 
nected to such a process. Thus, management of devices 
is management of the communication streams connected 
to them. Protection mechanisms are realized by insert- 
ing a filtering process to such streams, which examines 
messages going through the stream and rejects any illegal 
requests to the devices. 

As mentioned above, process management by PIMOS 
is through the shoen construct. PIMOS virtualizes shoen 
also as a task with higher-level functionality for resource 
management. Tasks are a virtual device with the func- 
tion of program execution with resource management 

5 More detailed description can be found in [Yashiro ti al. 1992], 
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facility. They can be controlled from user programs 
only through streams connected to it. The same protec- 
tion mechanism of inserting message filtering processes 
is used here. 

4.3.2 Hierarchical Resource Management 

In most conventional operating systems, all the vital 
management information is centralized to the kernel, 
which is usually implemented as a single process. This 
centralization policy makes it easy to keep the manage- 
ment information consistent. 

In a highly parallel system, however, such centraliza- 
tion of management information would become problem- 
atic. Even if the overhead of the kernel is only one 
percent, the processing speed of the kernel will be the 
bottleneck of the system in a system with only one hun- 
dred processors. Moreover, all the management requests 
will be targeted to the processor where the kernel pro- 
cess runs, resulting in a hot spot in the communication 
mechanism. In an operating system for highly parallel 
computer systems, management jobs also have to be dis- 
tributed. 

Random distribution of management jobs, using hash- 
ing technique for example, would relieve the bottleneck 
problem, but introduces a new problem of frequent com- 
munication, as the requests for operating system services 
arise everywhere without regard to where the service is 
provided. 

To avoid the bottleneck and frequent communication 
at the same time, it is essential to distribute manage- 
ment jobs keeping the locality of information. PIMOS, 
thus, adopted hierarchical resource management policy. 
User tasks and resources allocated by the operating sys- 
tem form a hierarchical structure. As the design prin- 
ciple leaves computation mapping to application pro- 
grams, processes of PIMOS responsible for management 
jobs will be allocated where requests for services arise, 
and those management processes also form a hierarchi- 
cal structure corresponding to the structure of user tasks, 
called resource tree. This resource tree is the kernel of 

PIMOS. 

No centralization of resource management information 
is made and no total ordering of resource allocation is 
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Figure 4: Task and Management Hierarchies 



tried. A management process, which is a node in the 
resource tree, knows only of its parent and children. Al- 
location of a new resource is handled locally at one level 
in the hierarchy without reporting it to upper levels nor 
lower levels. When necessary, statistical summaries of 
management information is exchanged in the resource 
tree, but there is no single process that knows the state 
of the whole system precisely. The state of the whole sys- 
tem can be investigated by traversing the tree structure, 
but that would be costly and, because of the concurrent 
activities in the system, obtained information might al- 
ready be obsolete when the the traversal completes. We 
found this loose management policy works fine without 
any problems. 



4.3.3 Servers 

All the services of PIMOS are provided by servers, which 
correspond to virtualized devices. Servers are realized as 
usual tasks to make the kernel compact and to enable 
easy addition of services. 

An application program (client) requiring a service (to 
open a display window, for example) can ask for the ser- 
vice by requesting to the kernel with the name of the 
service. The kernel will look for the named service in a 
table it maintains and establishes a stream connection 
between the server task and the client task, inserting a 
filtering process for protection in the client task at the 
same time. Once the connection is established, the kernel 
will not look into messages passed through the stream; 
the server is protected by the inserted filter rather than 
a kernel process. When the service become no longer 
needed, the client process normally closes the communi- 
cation stream. The remaining responsibility of the ker- 
nel is to notify the server of abnormal termination of the 
client. 



4.4 File System 

Earlier versions of PIMOS operating on an experimental 
model Multi-PSI [Takeda et al. 1990] left all the exter- 
nal input and output to its I/O front-end processor, PSI 
[Nakashima 1987]. This was profitable in rapidly con- 
structing a software development environment for appli- 
cations research. For massive external storage, such as 
disks, the imbalance of the low throughput communica- 
tion with the I/O front-end and high performance pro- 
cessing power of the parallel hardware, however, became 
more apparent with PIM [Taki 1992]. 

We thus decided to connect disks more directly to pro- 
cessors of PIM for higher throughput and shorter delay. 
To minimize hardware development effort, we adopted 
SCSI (small computer standard interface) to interface 
disks available in the market. Although single SCSI can 
provide rather low throughput, PIM can have many of 
them, providing required total throughput. 

As the interface provides only low-level block I/O to 
disks, we designed a file system to provide higher-level 
interface to application programs. In designing the file 
system, we took the following principles. 

Distributed Cache: To lower interprocessor commu- 
nication frequency, each processor should have its own 
cache of data in file. The cache mechanism should 
provide “Unix semantics”: When one process writes 
into a file, the data should become available to other 
processes immediately. This is a constraint severer 
than in many distributed file systems where some de- 
lay is allowed [Levy and Silverschatz 1989], but it is 
mandatory in a system like PIMOS, where processes 
are usually cooperatively solving one problem. Thus, 
a distributed and coherent caching mechanism was de- 
signed, which is similar to cache coherence mechanisms 
provided by snoopy cache [Archibald and Bare 1986] 
but allows delay of communication. 

Robustness: As all the system components, including 
the hardware, the operating system and the file sys- 
tem itself, are experimental and subject to damage 
caused by bugs, sufficient backing up mechanism is re- 
quired to provide a comfortable software development 
environment. Logging of information vital to the file 
system and quick recovery mechanism using the logged 
information were designed. 

More detailed description of the file system can be found 
in [Itoh et al. 1992]. 

4.5 Software Development Tools 

Development of parallel software has many aspects dif- 
ferent from development of sequential software. PIMOS 
provides various tools to support development of parallel 
software, described in this section. 
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4.5.1 Program Code Management 

Executable programs are provided as data objects of type 
module by the kernel language and can be manipulated . 
through language primitives by authorized software. Al- 
though the representation of executable programs differ 
in each hardware models, a common interface to manip- 
ulate programs is provided by PIMOS to encapsulate the 
differences. 

Executable programs are stored in a database, which 
is a virtual device realized by a server task. To maintain 
the logical soundness of the specification, it is not de- 
sirable to introduce the notion of modification, not only 
for usual data but also for programs which are also data. 
Updating a program module does not mean modification 
of an already existing program, which might be running 
in parallel somewhere in the system; it merely means 
updating of the correspondence of module names and 
executable programs kept in the program database. The 
existing processes that are executing the program will 
not be affected by this update, except that, when the up- 
dated module is referenced by its name and the database 
is searched for, a new version of it will be found. Mul- 
tiple versions of the same program can thus coexist in a 
system. This not only keeps the semantics clean but also 
allows efficient distributed implementation. 

4.5.2 Debugging Tracer 

The most frequently used tools in debugging programs 
are tracers that allow programmers to look into the de- 
tails of program execution. PIMOS also provides a pro- 
gram tracer for this debugging purpose. 

Execution of programs in a high level language form 
a hierarchical structure such as nested subroutine calls. 
In case of subroutines in sequential languages, substruc- 
tures corresponding to subroutine invocations directly 
correspond to a time interval, such as “during execution 
of a subroutine.” Tracing or not tracing that particular 
substructure can be effected by switching tracing on and 
off during that time interval. In concurrent languages, 
such direct correspondence does not exist as many such 
substructures are executed concurrently. If the number 
of processes is limited, providing multiple windows, one 
for each process, and switching tracing on each of them 
might be a good idea. In case of KL1 programs, the 
number of processes typically goes up to millions, much 
more than tractable this way. The tracer of PIMOS also 
provides a feature to direct the trace information to mul- 
tiple windows, but their role is only auxiliary. 

The shoen construct of the kernel language is used to 
control tracing, to obtain trace information and to con- 
trol execution of traced programs. Each goal executed in 
a shoen can be marked as a traced goal. When the lan- 
guage implementation finds reduction of such a goal to 
its subgoals, the newly created subgoals will be reported 
from the report stream of the shoen as a message. The 



tracer observing the stream presents the information to 
the user and queries what to do with the goals, that is, 
whether to simply execute them or execute them with 
trace marks again. The goals can also be suspended for 
a while to control their execution order. 

The tracer also has interface with the deadlock de- 
tection mechanism provided by the KL1 implementation 
[Inamura and Onishi 1990]. 

4.5.3 Performance Tuning 

As stated above, a strong point of the kernel language 
KL1 is that mapping of computation, both over proces- 
sors and over time, can be altered without affecting the 
correctness of programs. Finding a mapping which real- 
izes efficient computation is one of the most important 
research topics in application software research on the 
parallel inference system. 

However, conjecturing mapping only by statically an- 
alyzing programs is a very difficult task. In many cases, 
actually running the programs and gathering statisti- 
cal information reveals many aspects of programs that 
are easily overlooked. To help such experimentation, PI- 
MOS provides a tool for evaluating load distribution al- 
gorithms. 

Profiling information of parallel programs has three 
axes: what, when, and where. In sequential execution, 
“where” is a constant and the “when” is not important, 
since the execution order is strictly designated. Simple 
profiling tools that can tell “what” (which part of the 
program) took how much time will thus suffice. How- 
ever, all three axes are important when parallel execu- 
tion is our concern. The kernel language implementation 
has the feature to provide three-dimensional statistics on 
what (which part of the program, or, in a lower level, 
whether usual computation, interprocessor communica- 
tion or garbage collection) is executed where (on which 
processor) and when. 

As it is not easy for a human to understand massive 
raw data from hundreds of processors, a profiling tool 
named ParaGraph is provided to analyze the data and 
present it to the user graphically (Figure 5). The sys- 
tem provides displays from several different viewpoints, 
making the analysis easier. The ParaGraph system is 
described in more detail in [Aikawa 1992 et al.}. 

4.5.4 Virtual Machine 

As all the communication between user programs and PI- 
MOS is initiated through the control and report streams 
of shoens , a user program can emulate PIMOS by run- 
ning programs within a shoen and observing its interface 
streams. 

The same technique also can be used to debug PIMOS 
itself by writing an emulator of the whole parallel com- 
puter system, a virtual machine. This facility provides 
a way to debug PIMOS under the software environment 
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Figure 5: Sample Output of ParaGraph 



provided by PIMOS itself. As the virtual machine is no 
more than a usual task in PIMOS, the protection mech- 
anism of PIMOS prevents bugs of the debugged version 
from propagating to the real PIMOS. Also, the profiling 
system ParaGraph can be used for performance tuning 
of PIMOS. This facility has been conveniently used in 
debugging and tuning of the kernel of PIMOS. 

5 Experiences 

The first version of PIMOS was implemented on Multi- 
PSI [Takeda et al. 1990] in 1988 [Chikayama et al. 1988]. 
It has been revised with various enhancements and im- 
provements since, through experiences with research and 
development of experimental software on many applica- 
tion areas. As the experiences with application software 
ai’e reported elsewhere (see [Nitta et al. 1991] for exam- 
ple), this section mainly reports the experiences of the 
development of PIMOS itself in the kernel language KL1. 

5.1 Automatic Synchronization 

The automatic data-flow synchronization mechanism of 
KL1 assured portability of PIMOS to hardware systems 
with different architectures. 

The first version of PIMOS was developed in parallel 
with the development of the experimental parallel infer- 
ence machine Multi-PSI. During its early development 
phase when no physically parallel system running the 
kernel language was available yet, a sequential imple- 
mentation was used in the development. The schedul- 
ing of goals was fixed on the implementation. We could 
not completely deny the possibility of any crucial syn- 
chronization problems in the system hidden by the fixed 
scheduling of the emulator; that was our first experience 
of actually writing a large-scale software in KL1. 



PIMOS was ported to Multi-PSI when its KL1 im- 
plementation got ready. We found almost no synchro- 
nization problems there (except for a small number of 
higher-level design problems) although the scheduling on 
the real parallel machine is quite different from the em- 
ulator. We were certain that this should be the case, 
but actually experiencing this made us more confident 
of the great merit of writing a system in a language with 
automatic data-flow synchronization. 

In 1991, the first model of the parallel inference ma- 
chines, PIM/m and its KL1 implementation was made 
available for software installation. After revising the 
low-level I/O mechanism to fit the system to this new 
platform, PIMOS began working almost immediately on 
this system without revealing any problems. This was 
not surprizing as the kernel language implementation on 
the system used the identical scheduling policy as the 
Multi-PSI system. 

Later in the same year, the system was ported to an 
emulator of PIM running on a commercially available 
parallel processor. The emulator was primarily for de- 
bugging the design of kernel language implementation 
for models consisting of loosely-coupled clusters , each 
of which has several processors sharing a memory bus. 
The scheduling policy of this emulator was completely 
different from Multi-PSI or PIM/m, as the language im- 
plementation distributes goals automatically among pro- 
cessors in a cluster. As we expected, and also to our 
surprise, PIMOS ran without any problems in itself but 
revealing some problems with the language implementa- 
tion in stead. 

Currently (February 1992), the kernel language imple- 
mentation and PIMOS are being ported to other models 
of PIM. We are now certain that there won’t be any fun- 
damental problems in porting PIMOS to those models. 

5.2 Fine-Grain Concurrency 

It is true that most human algorithm designers are li- 
able to regard computation as a sequential process and 
some extra effort is needed to think of many cooperat- 
ing processes for a single job. This fact is sometimes re- 
garded as against parallel processing, that designing par- 
allel computation is unnatural for human. The implicit 
concurrency of the kernel language, however, resulted in 
interesting phenomena. 

Most algorithms in fact are designed having sequen- 
tial processing in mind or limited aspects of the par- 
allelism. Once a program for the algorithm is written 
down in the kernel language, the program often shows 
much more concurrency than the designer had in mind, 
as the language reveals implicit fine-grain concurrency. 
The designer can look into the program more objectively 
and find different aspects of concurrency implied there. 
Sometimes, the concurrency so found is a good candidate 
for obtaining larger physical parallelism for increased ef- 
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ficiency. Mapping pragmas exploiting the concurrency 
can then be added to the program to make it run with 
higher parallelism and more efficiently. This should not 
have been possible if the language had only larger-grain 
concurrency. 

5.3 Descriptive Power 

Through the development of PIMOS, the descriptive 
power of KL1 for both concurrency and parallelism was 
proved to be sufficient. 

The ability of describing reactive systems allowed the 
language to provide primitives to control external I/O 
devices in a coherent manner; external devices could 
be modeled as an ordinary process without introduc- 
ing any extralogical features to the language. This al- 
lowed straightforward implementation of a virtual ma- 
chine, which helped the development considerably. 

The shoen construct and the priority control mecha- 
nism of the kernel language provided sufficient function- 
ality required to control execution of various activities 
in the system. For example, in case a user program ran 
into an infinite loop, the following steps will enable in- 
terruption of such a program. 

• As the device handlers are given higher priority than 
user processes, an interrupt from the keyboard can 
be sensed. 

• As the command shell, which is a user task, lets jobs 
under its control run in a priority lower than itself, 
the shell can sense the interrupt. 

• Using the shoen construct, the shell can stop the 
task in an infinite loop. 

5.4 Ease of Programming 

Many programmers seem to have felt uneasiness with the 
kernel language when the system first began utilized in 
application software development. The largest source of 
the problem seems to be in too much freedom of pro- 
gramming styles. 

The bare kernel language allows multiple input/output 
modes of logical variables; the same process can read or 
write the same shared variable, depending on situations. 
Although this is allowed in the language, it often in- 
troduces race conditions which become problematic only 
with specific scheduling. Such a bug is hard to fix as trac- 
ing the execution or modifying the program to report in- 
formation for debugging may change the scheduling, hid- 
ing the problem away. Gradually, a programming style 
has been established where I/O modes of logical vari- 
ables are statically fixed. This indicated the direction of 
subsetting of the language (see section 6). 

Another problem wa.s how to organize numerous con- 
current processes. Many styles have been tried and 



the object-oriented programming style [Shapiro and 
Takeuchi 1983] has been accepted as the dt facto stan- 
dard. Many programming idioms have been estab- 
lished upon this object-oriented style through experi- 
ences [Chikayama 1991], which suggested the direction 
of the design of higher level languages (see section 3.3). 

Automatic data-flow synchronization wiped away low- 
level synchronization problems, allowing programmers to 
concentrate on higher-level issues. With the program- 
ming style established and the software development en- 
vironment enhanced based on the experiences, describ- 
ing parallel software in the kernel language has now be- 
come not much more difficult than programming sequen- 
tial programs in other languages for symbolic processing, 
such as Lisp. 

The largest difficulty remaining is that of designing al- 
gorithms of computation mapping for efficient execution. 
Separation of correctness and efficiency issues in the lan- 
guage design and the visual performance analysis tool 
facilitated experimentations of mapping algorithms con- 
siderably, but still the task is not easy. Further research 
in this direction seems mandatory. 



6 Future Work 

A problem with the current parallel inference system, 
consisting of parallel inference machines, KL1 implemen- 
tations and PIMOS, is that the system runs only on 
specially devised hardware. Although the system can 
execute KL1 programs very efficiently, requiring special 
hardware is a serious obstacle in sharing the environment 
with researchers world-wide. A portable implementation 
of the kernel language working on Unix systems is avail- 
able and was utilized in early stages of software develop- 
ment, but, as it is implemented as an abstract machine 
interpreter, its limited performance makes it inappropri- 
ate for serious experimental studies. 

To solve the problem, research in subsetting the lan- 
guage to allow more concise and efficient implementa- 
tions has been conducted with promising preliminary re- 
sults [Ueda and Morita 1990]. A separate effort of im- 
plementing KL1 by translating to C also indicated that 
reasonable performance can be obtained with very high 
portability [Chikayama 1992], These results indicate the 
possibility of implementing the language on stock hard- 
ware efficiently for use in parallel software research. In 
addition to such an implementation, PIMOS, especially 
its software development environment, should also be 
ported to stock hardware to provide common basis of 
research and development of highly parallel knowledge 
information processing systems. 
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7 Conclusion 

An overview of the research and development of the basic 
software for the parallel inference system of the FGCS 
project is given. 

The system aims at establishing the basis of software 
technology for highly parallel computer systems. The re- 
search and development adopted a middle-out approach 
of designing a programming language first and then con- 
tinuing the design both upwards to the application soft- 
ware and downwards to the hardware architecture simul- 
taneously. The kernel language KL1 and the operating 
system PIMOS were designed and implemented. 

The systems working on experimental parallel infer- 
ence hardware Multi-PSI and a model of parallel infer- 
ence machine PIM have been used in the research and 
development of application software since 1988. Our ex- 
periences have proved that the kernel language is expres- 
sive enough for describing an operating system for paral- 
lel processing systems and various application software. 
The features of the language that separated correctness 
and efficiency issues, along with the programming envi- 
ronment provided by the operating system, made em- 
pirical research of parallel software much easier than in 
conventional environments. 

Further research in computation mapping is needed in 
future. Development of an efficient and comfortable en- 
vironment on stock hardware is another important work 
to be done. 
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Abstract 

Knowledge representation languages and knowledge- 
bases play a key role in knowledge information pro- 
cessing systems. In order to support such systems, we 
have developed a knowledge representation language, 
Quixote , a database management system, Kappa, as 
the database engine, some applications on Quixote 
and Kappa, and two experimental systems for more 
flexible control mechanisms. 

The whole system can be considered as under 
the framework of deductive object-oriented databases 
(DOODs) from a database point of view. On the other 
hand, from the viewpoint of the many similarities be- 
tween database and natural language processing, it can 
also be considered to support situated inference in the 
sense of situation theory. Our applications have both 
of these features: molecular biological databases and a 
legal reasoning system, TRIAL, for DOOD and a tem- 
poral inference system for situated inference. 

For efficient and flexible control mechanisms, we 
have developed two systems: cu-Prolog based on un- 
fold/fold transformation of constraints and dynamical 
programming based on the dynamics of constraint net- 
works. 

In this paper, we give an overview of R&D ac- 
tivities for databases and knowledge-bases in the 
FGCS project, which are aimed towards an integrated 
knowledge-base management system. 

1 Introduction 

Since the Fifth Generation Computer System (FGCS) 
project started in 1982, many knowledge information 
processing systems have been designed and developed 
as part of the R&D activities in the framework of logic 
and parallelism. Such systems have various data and 
knowledge, that is, expected to be processed efficiently 
in the form of databases and knowledge-bases such as 



electronic dictionaries, mathematical databases, molec- 
ular biological databases, and legal precedent databases 
+ Representing and managing such large amounts of 
data and knowledge for these systems has been a major 
problem. Our activities on databases and knowledge- 
bases are also devoted to such data and knowledge 
under logic paradigm. 

Since the late seventies, many data models have been 
proposed for extension of the relational model in or- 
der to overcome various disadvantages such as ineffi- 
cient representation and inadequate query capability. 
Among their extensions, deductive databases attracted 
many researchers not only in logic communities but 
also in artificial intelligence communities, because of its 
logic platform and strong inference capability. Many 
efforts on deductive databases have defined the theoret- 
ical aspects of databases and have showed the powerful 
capability of query processing. However, from an ap- 
plication point of view, the data modeling capability 
is rather poor. This is mainly due to representation 
based on first-order predicates, which inherits the dis- 
advantages of the relational model. On the other hand, 
object-oriented databases have become popular among 
extensions of the relational model for coping with ‘new’ 
applications such as CAX databases and multi-media 
databases. The flexibility and adaptability of object- 
orientation concepts should be examined also in the 
context of deductive databases, even if object-oriented 
databases have disadvantages such as poor formalism 
and semantic ambiguity. 

J The boundary between databases and knowledge-bases is 
unclear and their usage depends on context. Most database 
communities prefer to use the term database even if databases 
store a set of rules and have an inference capability such 
as deduction and abduction: e.g., deductive databases, expert 
databases, and self-organizable databases. In this paper, we 
also use the term database according to this convention. The 
term knowledge-base in our title shows our view that an ap- 
proach based on extensions of databases is a better way to 
real knowledge-bases than based on conventional knowledge- 
bases used by expert systems. 




